Data Bias in Human Mobility is a Universal Phenomenon but is Highly Location-specific

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper identifies a pervasive “production bias” in large-scale human mobility data: systematic inequalities in data generation across demographic groups, strongly correlated with wealth, race, and educational attainment—exhibiting greater disparity than income inequality itself. Leveraging anonymized smartphone GPS trajectories from ten cities, the study employs census-tract-level statistical modeling and multivariate regression to jointly quantify *individual representativeness* and *data production capacity*—a novel dual-dimensional framework. Results reveal pronounced geographic heterogeneity: bias patterns vary substantially across cities, rendering universal debiasing methods ineffective. The paper introduces the paradigm of *location-specific bias modeling*, advocating city-specific models to accurately characterize and correct for spatially varying data production inequities. This work provides critical empirical evidence and methodological guidance for ensuring algorithmic fairness, equitable urban policy design, and responsible data governance.

Technology Category

Application Category

📝 Abstract

Large-scale human mobility datasets play increasingly critical roles in many algorithmic systems, business processes and policy decisions. Unfortunately there has been little focus on understanding bias and other fundamental shortcomings of the datasets and how they impact downstream analyses and prediction tasks. In this work, we study `data production', quantifying not only whether individuals are represented in big digital datasets, but also how they are represented in terms of how much data they produce. We study GPS mobility data collected from anonymized smartphones for ten major US cities and find that data points can be more unequally distributed between users than wealth. We build models to predict the number of data points we can expect to be produced by the composition of demographic groups living in census tracts, and find strong effects of wealth, ethnicity, and education on data production. While we find that bias is a universal phenomenon, occurring in all cities, we further find that each city suffers from its own manifestation of it, and that location-specific models are required to model bias for each city. This work raises serious questions about general approaches to debias human mobility data and urges further research.

Problem

Research questions and friction points this paper is trying to address.

Quantifying unequal data production in human mobility datasets

Predicting data points based on demographics and location

Addressing location-specific bias in mobility data modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantifying data production bias in mobility datasets

Predicting data points using demographic group composition

Developing location-specific models for bias correction

🔎 Similar Papers

No similar papers found.

Authors to Follow