Hi, I’m Patrick. I’m a research scientist at Fetch.AI. My background is in high energy physics phenomenology, and after working as an academic researcher for several years, I moved into the colourful world of tech start-ups.
At Fetch.AI, I have been working across a range of subjects, conducting research into consensus protocols, studying signature schemes in decentralised and distributed networks, and more recently, working on a data mining project with one of my colleagues, Daniel Honerkamp. In this piece I will be focusing on the latter.
The starting point for the data mining project is the GPS Trajectory dataset, collected by Microsoft Research Asia for the GeoLife project. The most recent dataset consists of trajectories made up of GPS coordinates, collected by 182 mobile phone users in the period spanning from April 2007 to August 2012. It consists of sequences of time-stamped GPS points. There are 17,621 trajectories, totalling a distance of 1,292,951km with a total duration of 50,176 hours. The dataset has a high resolution as 91.5% of the trajectories have been logged densely, i.e. every 1~5 sec. Furthermore, 62 of the users provided information on their mode of transportation for some of the trajectories. This means that we have labelled data at our disposal, which can be used to train various machine learning models. Consequently, these can be used for transportation mode prediction. There have been various approaches to this already, but there is still scope to explore more.
Inspired by the work accomplished by a group of researchers at Microsoft Research Asia, we began by identifying so-called stay points. A stay point is the average point of all points in a sequence where 1) the distance from the first point to each of the subsequent ones is less than a threshold distance, D, and 2) where the timespan between the first and last point in the sequence is above a certain threshold, T. In popular places there will be an agglomeration of stay points, pertaining to different users. Using unsupervised learning (clustering, in this case) it is possible to identify Points of Interest (PoI). These are the locations where many users tend to spend a certain amount of time, and hence are locations that can be perceived as interesting for one reason or another.
Currently, I am investigating ways in which we can gain insight into what user patterns can be found at the Points of Interest. Using this information I am studying what amenities, utilities, facilities etc. are present in the neighbourhood and what their relative distance to a PoI is. The method I am using is to look at how many different users have been to a PoI, determine the frequency of visits, the distribution of arrival and leaving times, the time spent there etc. This data can be used to provide us with information on why a place appears interesting and allows us to provide users with more targeted suggestions at a given PoI. For example, places of work and sleep can be inferred and matched with either companies or hotels and residential areas.
As is apparent by now, the project becomes considerably open-ended as one can work from various angles and gather a broad range of insights. The reader may notice that, so far, I have been using data that is neither particularly new, nor very local, to my immediate surroundings. What I have described here is merely a first step in a much larger effort. We are essentially using the GeoLife Trajectory dataset to create a query and analysis prototype. The framework that will emerge from this can then be used with other, newer data, including data that we will be collecting consensually here at Fetch.AI, through the Network Participation App which is connected to our ledger.
There are huge benefits to be gained from such an approach, but crucially, we also believe privacy and data security are extremely important. That’s why users of the Fetch.AI network will be able to select the data they choose to share with others.