A personal knowledge base integrating user data and an activity timeline

Ph.D. defense

The defense will be held in French.

Jury members

Keywords: personal information management, activity recognition, data integration, transportation mode recognition, knowledge base, mobile device sensor data.


Typical Internet users today have their data scattered over several devices, applications and services. Managing and controlling one’s data is increasingly difficult. In this thesis, we adopt the viewpoint that the user should be given the means to gather and integrate her data, under her full control. In that direction, we designed a system that integrates and enriches the data of a user from multiple heterogeneous sources of personal information into an RDF knowledge base. The system is open-source and implements a novel extensible framework that facilitates the integration of new data sources and the development of new modules for deriving knowledge. We first show how user activity can be inferred from mobile phone sensor data. We introduce a time-based clustering algorithm to extract stay points from location history data. Using data from additional mobile phone sensors, as well as geographic information from OpenStreetMap and public transportation schedules, we introduce a transportation mode recognition algorithm to derive the different modes and routes taken by the user when travelling. The algorithm derives the sequence of maximum probability in a conditional random field with a neural network layer. We also show how the system can integrate data from email, calendars, address books, social network services, and location history into a coherent whole. To do so, it uses entity resolution to find the set of avatars used by each real-world contact, and performs spatiotemporal alignment to connect each stay point with the event it corresponds to in the user’s calendar. Finally, we show that such a system can also be used for synchronization across different systems and devices and allow knowledge to be pushed to the sources. We present extensive experiments.