🤖 AI Summary
This study addresses the timeliness and coverage limitations of official health inspections in New York City by leveraging Yelp review text to detect early signals of foodborne illness. We propose the Hierarchical Sigmoid Attention Network (HSAN)—the first text classification architecture explicitly designed for fine-grained modeling of restaurant-level health risks—and integrate it with spatial analysis and statistical testing to assess the geographic association between user-generated content and official health ratings at the census tract level. Experiments demonstrate that HSAN effectively extracts health risk signals from large-scale unstructured reviews. However, its predictions exhibit no statistically significant spatial correlation with either official inspection scores or the density of restaurants receiving “C” grades, revealing a fundamental decoupling between social media–derived signals and traditional regulatory metrics. This work establishes a novel methodological framework for passive public health surveillance and highlights critical challenges in cross-source data fusion for urban health monitoring.
📝 Abstract
Foodborne illnesses are gastrointestinal conditions caused by consuming contaminated food. Restaurants are critical venues to investigate outbreaks because they share sourcing, preparation, and distribution of foods. Public reporting of illness via formal channels is limited, whereas social media platforms host abundant user-generated content that can provide timely public health signals. This paper analyzes signals from Yelp reviews produced by a Hierarchical Sigmoid Attention Network (HSAN) classifier and compares them with official restaurant inspection outcomes issued by the New York City Department of Health and Mental Hygiene (NYC DOHMH) in 2023. We evaluate correlations at the Census tract level, compare distributions of HSAN scores by prevalence of C-graded restaurants, and map spatial patterns across NYC. We find minimal correlation between HSAN signals and inspection scores at the tract level and no significant differences by number of C-graded restaurants. We discuss implications and outline next steps toward address-level analyses.