Lessons From the Trenches: using Mobile Phone Data for ... · Lessons From the Trenches: using...
Transcript of Lessons From the Trenches: using Mobile Phone Data for ... · Lessons From the Trenches: using...
Lessons From the Trenches: using Mobile Phone Data for
Official Statistics
Maarten Vanhoof
Orange Labs/Newcastle University
@Metti Hoof
MaartenVanhoof.com
Mobile Phone Data (Call Detail Records)
Mobile Phone Data (Signaling)
Mobile Phone Data (Call Detail Records)
Metadata • Caller (phone)
• Called phone
• Timestamp
• Type of event
• Duration of call/Length of text
• Location of celltower
• …
Mobile Phone Data (Call Detail Records)
Toole et al. (2015) Coupling Human Mobilities and Social Ties.
Individual indicators: Bandicoot
https://github.com/yvesalexandre/bandicoot
http://bandicoot.mit.edu/demo/
• active days • number of contacts • number of interactions • call duration • percent nocturnal • percent initiated interactions • response delay text • entropy of contacts • balance of contacts • interactions per contact • inter-event time • percent pareto interactions • percent pareto durations • number of antennas • entropy of antennas • percent at home • radius of gyration • frequent antennas
Individual indicators for official statistics
Behavioural
Individual mobility
e.g. diversity of mobility
Contextual
• Car ownership
• Access to public transport
• Income
• Marital status
• Membership
• Home location
• Etc.
Individual indicators for official statistics
Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.
Individual indicators for official statistics
Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.
Individual indicators for official statistics
(Geographical) Veracity
Spatial allocation
Spatial delineation
Spatial aggregation
(Geographical) Veracity
Spatial allocation
Spatial delineation
Spatial aggregation
Spatial allocation: Home detection
Pappalardo,Vanhoof, et al. (2016) An Analytical Framework to Nowcast Well-Being using Mobile Phone Data.
Spatial allocation: Home detection
Uncertainty of home allocation algorithms
• No knowledge on how certain we can geographically pinpoint users
• Because no ground truth is available
Spatial allocation: Home detection
Spatial allocation: Home detection
Performance
Uncertainty
Vanhoof et al. (Submitted) Investigating Performance and Spatial Uncertainty of Home Detection Criteria for CDR data
Spatial allocation: Solution?
• In short term, we need to: • Create a better understanding on the uncertainty that comes with home detection
• Test heuristics for home detection on different databases and for different countries
• Design surveys to gather ground truth at the individual level
• In long term, we need to: • Understand how change in mobile phone use/available datasets influence allocation
• Decide on standardizing home detection and error assessment
• Design a platform where all operators, researchers, policy makers can easily do this and compare results between different datasets
(Geographical) Veracity
Spatial allocation
Spatial delineation
Spatial aggregation
Spatial delineation
• Uneven delineations of space • Between antennas (high-density vs. low-density, operator 1 vs. operator 2,..)
• Between antennas and administrative regions (cell-tower coverage vs. municipalities)
• Between different definitions of urban areas (Urban Units vs. Urban Areas)
• Create errors that are poorly understood and challenging to address
• Is relevant for • Population Density Estimations
• Mobility Derivation
• Parameter estimation (e.g. for urban scaling laws) in statistical analysis
• Error/uncertainty assessment
Spatial delineation: Mobility Entropy
Vanhoof, et al. (Submitted) Correcting Mobility Entropy from CDR data for large-scale comparison of individual movement patterns
Spatial delineation: Mobility Entropy
Vanhoof, et al. (Submitted) Correcting Mobility Entropy from CDR data for large-scale comparison of individual movement patterns
Spatial delineation: Urban scaling laws
Cottineau et al. (2016) Paradoxal Interpretations of Urban Scaling Laws
Spatial delineation: Solution?
• In short term, we need to work on : • Minimizing the influence of spatial delineations on our measurements
• Techniques that allow translation between different spatial delineations
• Assessments of the influence of spatial delineation (geo-computation)
• In long term, we need to: • Overthink possibilities to standardize spatial delineations
• Develop practices in Official Statistics that express the effect of spatial delineation
Spatial delineation: Urban scaling laws
Cottineau et al. (2016) Paradoxal Interpretations of Urban Scaling Laws
(Geographical) Veracity
Spatial allocation
Spatial delineation
Spatial aggregation
Spatial aggregation
• Scale does matter for: • Unintended selective filtering (e.g. highly active persons, communities)
• Objective construction of indicators (e.g. 5 km in Paris or in the Pyrenees)
• Representativeness of single operators (e.g. distorted market shares)
• Personal behaviour (e.g. long-distance vs. Short-distance trips)
• Geographical, economical, sociological, ecological,etc. context (e.g. transport infrastructure)
• Still, there is no single evidence that current (spatial) aggregation practices take into account any of these when studying mobile phone data.
• In addition, given the highly changing nature of mobile phone use, it is my hypothesis that behavioral data is even more prone to this fallacy.
Spatial aggregation
Cell-tower level IRIS level
Population Density Estimation vs. Official Statistics
Relations between indicators
Spatial aggregation: Solution?
• In short term, we need to work on : • Techniques that define the best spatial scale for studying certain processes
• Both empirical, quantitative (e.g. optimal raster sizes for population density estimations)
• As theoretical, qualitative (e.g. expert judgment)
• Techniques that express changing nature of observations when (spatially) aggregating • E.g. Representativeness in population terms of single operator data at different scales
• Techniques that investigate, or even incorporate sensitivity of definitions to spatial scale • E.g. Fragmented definitions of distance according to scale
• Techniques that investigate sensitivity of data to spatial aggregation • E.g. Spatial autocorrelations
• In long term, we need to: • See how all of this evolves over time as human behaviour & mobile phone use will change
Thoughts
• Why starting from individual indicators? • Privacy issues (newer datasets don’t allow this)
• Computationally expensive treatment
• Temporal resolution is far from optimal
• Difficult to communicate/visualise
• Why not using the ‘big’ aspect of the data and use patterns? • Activity patterns of cell-towers
• High-level communication/commuting patterns
• Population presence registration
High-level analysis: Learning Urban Areas
Combes, de Bellefon and Vanhoof (Submitted) Understanding urban centers organization and influence with mobile phone data
High-level analysis: Learning Urban Areas
Combes, de Bellefon and Vanhoof (Submitted) Understanding urban centers organization and influence with mobile phone data
Don’t be Batman.
The same problems and scientific questions will persist. Only now less visible, and as such, less provoked.
Conclusion
• ‘Work from the trenches’ on individual data identifies problems but • Is done by a limited amount of researchers
• Not a priority for operators (never was, never will be)
• Lack of data and knowledge at the institutions (but they are catching up)
• Limited rewards in academics, limited scientific community
• Is threatened by protective measurements on data • Impossibility to continue pursuing in-depth research
• Fled to African data, but limited quality of official statistics there
• Development of shared platforms for analysis, but simplifies workflows
• Is mostly limited to one-dataset, one-operator • Comparison of findings is absolutely necessary for better insights and methods
• Dream to have full coverage of population is feasible but needs strong policy