Predicting Indoor Location Using WiFi Fingerprinting

Project Description

In this project, I predict the Indoor Location of users using Wifi fingerprints with a combination of Principal Component Analysis (PCA) and Multi-label Classification using skmultilearn. Many businesses and service providers rely on localization services in order to better serve their patrons. Thanks to the inclusion of GPS sensors in mobile devices, outdoor localization problems have been solved in a variety of ways and very accurately. However, indoor localization is still an open problem mainly due to the loss of GPS signal in indoor environments. Therefore, the problem of indoor localization has recently garnered increased attention from researchers who have opted to focus more on cheaper software solutions in place of expensive hardware solutions.

Indoor localization has many use cases and exhibits great potential for solving problems in:

  • Indoor navigation for humans and robots
  • Targeted advertising
  • Emergency response
  • Assisted living

You can find all the code available on my GitHub page here. and in the notebook at the end of the article.

Table of Contents

  1. Libraries
  2. Data
  3. Approaches in the Literature
  4. Literature on Indoor Localization
  5. My Approach: Multi-label Classification
  6. Literature on Multi-Label Classification
  7. Notebook Table of Contents (Analysis)
    1. Exploratory Data Analysis (EDA)
    2. Pre-Processing
    3. Model Applications and Predictions
      1. Problem Transformation
      2. Adaptive Algorithms
    4. HyperParameter Tuning
    5. Scoring the best model
    6. Pickle Model using Joblib
    7. Model Predictions for Validation Data
  8. Results
  9. Future Improvements

Libraries

  • Numpy
  • Pandas
  • Seaborn
  • Matplotlib
  • Scipy
  • Sklearn
  • Skmultilearn

Data

This data set is still unfortunately one of a kind and was recently presented by Joaquín Torres-Sospedra, Raúl Montoliu, Adolfo Martínez-Usó, Tomar J. Arnau, Joan P. Avariento, Mauri Benedito-Bordonau, Joaquín Huerta UJIIndoorLoc: A New Multi-building and Multi-floor Database for WLAN Fingerprint-based Indoor Localization Problems In Proceedings of the Fifth International Conference on Indoor Positioning and Indoor Navigation, 2014.

The UJIIndoorLoc database covers three buildings of Universitat Jaume I with 4 or more floors and almost 110.000 m2. It was created in 2013 by means of more than 20 different users and 25 Android devices. The database consists of 19,937 training/reference records (trainingData.csv file) and 1111 validation/test records (validationData.csv file).

The 529 attributes contain the WiFi fingerprint, the coordinates where it was taken, and other useful information. Each WiFi fingerprint can be characterized by the detected Wireless Access Points (WAPs) and the corresponding Received Signal Strength Intensity (RSSI). The intensity values are represented as negative integer values ranging -104dBm (extremely poor signal) to 0dbM. The positive value 100 is used to denote when a WAP was not detected. During the database creation, 520 different WAPs were detected. Thus, the WiFi fingerprint is composed by 520 intensity values.

Universitat Jaume I (UJIIndoorLoc database)


A visual mapping of the Longitude and Latitude data to get an image of campus (it matches the bird’s eye view above really well!)
A visual mapping of the users who collected the data in each building

Approaches in the Literature

Although available Data for indoor localization has unfortunately been scant, many have used this data to solve several problems in a variety of ways. Those approaches include the following:

  • Location identification using regression techniques
  • Floor positioning using classification
  • Building recognition using deep learning models
  • Trajectory tracking using a combination of the above methods
  • read more here

Literature on Indoor Localization

Here are several of the papers available on the topic that helped me in my research:


My Approach: Multi-label Classification

To my knowledge, my approach is a unique approach that has not been applied to the problem before. I treat the problem as a classification problem, but with a twist that can save time, effort and precious memory. I treat this problem as a Multi-Label Classification problem, wherein my model simultaneously predicts Building ID and Floor ID for a given input. Using a combination of Principal Component Analysis (PCA) and Multi-Label K Nearest Neighbor (MLKNN) algorithms, the model is able to predict the Building ID AND Floor ID simultaneously with 98.7% accuracy score and 0.003 Hamming loss on the training data and 81% accuracy score on the validation data. This model can also be expanded to include Space ID as well.

The “Difference between multi-class classification & multi-label classification is that in multi-class problems the classes are mutually exclusive, whereas for multi-label problems each label represents a different classification task, but the tasks are somehow related.” read more here.

Although many of the previously mentioned approaches (in the literature) yielded excellent results, many of them relied on predicting only one independent variable at a time regardless of the technique. For example, predicting only the building ID or the floor ID independently. In my opinion creating separate models for such a prediction task can be quite costly in terms of compute power, memory and time savings. Especially when using Neural Networks which can require great computational powers. Such losses, especially in time can be even deadly. Imagine a fire on the 4th floor of a particular building in a university. Having a model that can accurately and quickly predict how many people are in that exact area can be tremendously helpful to emergency response personnel.


Literature on Multi-Label Classification


Notebook Table of Contents (Analysis)


Exploratory Data Analysis (EDA)


  • The (training) data had 529 columns and 19937 rows.
  • 520 columns were the Wireless Access Points (WAPs) and the corresponding Received Signal Strength Intensity (RSSI). The intensity values are represented as negative integer values ranging -104dBm (extremely poor signal) to 0dbM.
  • 9 columns were the remaining defining attributes.
    • Scatter Matrices help us better understand the relationships between the potential target attributes. In this particular case, there is not much that can be inferred from the plot below, except that there are no linear relationships between any of those attributes.
    • Histogram plots illustrate the distributions of the data. Most of the attributes have a multinomial distribution. The only variety presented in the distributions, is that of the Longitude and Latitude. Those values seem to have more a skewed normal distribution, however, those values if predicted alone cannot provide adequate information. Therefore, We would have to have the same level of confidence in a Lat regression analysis predictions as the Long regression analysis prediction to pair them together. Given this complexity, I chose to classify the Building and Floor IDs.
  • There are 3 unique buildings and 5 floors total. However, not all buildings have 5 floors.
Scatter Matrix of the Attributes 
Histogram of the Attributes’ Distributions

Pre-Processing


  • Transform the WAP signals to positive integers and convert all 100 (i.e. not detected) to zero. This is designed to help us keep the sparsity of the matrix and ease the interpretation of results. Now the scale is 0-105.
  • remove several unnecessary columns such as User ID, Phone ID and Timestamp.
  • Separate the data into features and targets and designate Building ID and Floor ID as targets.
  • Split the data into training and testing to avoid over-fitting.
  • Standardize the data so that the variables with larger variances don’t overwhelm the PCA analysis.
    • Additionally, standardizing is preferred to MinMax scaling when computing PCA because we are interested in the components that maximize the variance. Therefore, I standardized our WAPs data so that those features will have a mean of 0 and standard deviation of 1. Read more here, and here .
  • After applying PCA to our data, we can see that our features have been reduced from 520 to 258. I chose 95% of the total explained variance because it seemed like a close enough representation of the total variance.
Principal Component Analysis Results (PCA=.95)

Model Applications and Predictions

Now that we have standardized our data and applied PCA to it, we must prepare the data for multilabel classification. It is recommended that the data be transformed into sparse matrices. Read more here.

When solving multi-label classification problems, there are two approaches, a traditional one and an adaptive one.

  • The traditional approach focuses on problem transformation. This approach transforms the multi-label classification problem into multiple single-label classification problems. Some of the algorithms include OneVsRest, Binary Relevance, Classifier Chains and Label Powerset.
  • The algorithmic adaptive approach focuses on adapting the algorithm to the multi-label classification problem directly.

In the following section, I applied both approaches to the problem to test for better performance and therefore better explanatory power.


Problem Transformation

1. Binary Relevance

“Transforms a multi-label classification problem with L labels into L single-label separate binary classification problems using the same base classifier provided in the constructor. The prediction output is the union of all per label classifiers” Read more here.

“This approach is popular because it is easy to implement, however it also ignores the possible correlations between class labels.” Read more here.

Binary Relevance Accuracy = 0.1216148445336008 (not sufficient).


2. Classifier Chains

“Constructs a bayesian conditioned chain of per label classifiers. This class provides implementation of Jesse Read’s problem transformation method called Classifier Chains. For L labels it trains L classifiers ordered in a chain according to the Bayesian chain rule. The first classifier is trained just on the input space, and then each next classifier is trained on the input space and all previous classifiers in the chain.” Read more here.

“The total number of classifiers needed for this approach is equal to the number of classes, but the training of the classifiers is more involved” Read more here.

Classifier Chains Accuracy is 0.40722166499498497 (not sufficient).


3. Label Powerset

“Transform multi-label problem to a multi-class problem. Label Powerset is a problem transformation approach to multi-label classification that transforms a multi-label problem to a multi-class problem with 1 multi-class classifier trained on all unique label combinations found in the training data. The method maps each combination to a unique combination id number, and performs multi-class classification using the classifier as multi-class classifier and combination ids as classes.” Read more here.

“This approach does take possible correlations between class labels into account. More commonly this approach is called the label-powerset method, because it considers each member of the power set of labels in the training set as a single label. However when the number of classes increases the number of distinct label combinations can grow exponentially. This easily leads to combinatorial explosion and thus computational infeasibility. ” Read more here.

Label Powerset Accuracy is 0.525827482447342 (Better, but still not sufficient enough).


Adaptive Algorithms

Multi-Label K-Nearest Neighbors (ML-KNN)

“MLkNN builds uses k-NearestNeighbors find nearest examples to a test class and uses Bayesian inference to select assigned labels.” Read more here.

“In detail, for each unseen instance, its k nearest neighbors in the training set are firstly identified. After that, based on statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the unseen instance.” Read more here.

Multi-Label K-Nearest Neighbors (ML-KNN) Accuracy = 0.9839518555667001 (Excellent!).

This model MLKNN is clearly the best model we have so far, the explanatory power is surging at 98% accuracy score. Timing wise, it is not as fast as the other problem transformation models, however, MLKNN is the best at prediction power.


HyperParameter Tuning

Given that we have selected MLKNN classifier to solve our problem, I ran a grid search to find the most optimal parameters for our model.

Here are the most optimal parameters for our MLKNN Model: {‘k’: 1, ‘s’: 0.5} for an accuracy result of 0.987710828265095.

After tuning our hyper-parameters, our model yield a 98.7% accuracy with K equal to 1, up from 98.4% where K was equal to 3.


Scoring the best model

  • Hamming Loss: The Hamming loss is the fraction of labels that are incorrectly predicted.
    • Hamming Loss = 0.0036985957873620864
  • Accuracy Score: In multilabel classification, this function computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels in y_true.
    • Accuracy = 0.9869608826479438

Pickle Model using Joblib

“It may be better to use joblib’s replacement of pickle (dump & load), which is more efficient on objects that carry large numpy arrays internally as is often the case for fitted scikit-learn estimators, but can only pickle to the disk and not to a string.” Read more here.


Model Predictions for Validation Data

I tested the model on the validation data set, a completely new and unseen data set to test the model. I applied the same steps to the Validation set as the training set including cleaning, pre-processing, standardizing, applying PCA and predicting using the MLKNN(K=1) classifier.

The accuracy results yielded were a whopping 81% accuracy, which was great!

The results were then mapped to a dictionary that translated the positioning of the data to match the building and floor ID for easy understanding and then saves to an external csv file.


Results

  • Using a combination of Principal Component Analysis (PCA) and Multi-Label K Nearest Neighbor (MLKNN) algorithms, the model is able to predict the Building ID AND Floor ID simultaneously on the Validation data with 81% accuracy score.
  • All of the predictions have been translated and saved to an external CSV to reflect the proper Building ID and Floor ID.

Future Improvements

  • This is a very unique data set and has become a harbinger of what’s to come. Yet, sadly, it is one of a kind, thereby severly limiting our ability to innovate any further.
  • Additionally, the majority of Space IDs(classroom information) in the Validation dataset were nulls, this severly hindered the ability to include Space ID in the model as well.
  • I hope that there will be clearer documentation and more lucid examples of skmultilearn (a very powerful toolkit) in the near future.