Towards fairer public transit: Real-time tensor-based multimodal fare evasion and fraud detection

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of detecting fare evasion and fraudulent behaviors in public transportation, specifically targeting tailgating, unauthorized entry, and ticketing anomalies. Method: We propose a real-time multimodal detection framework integrating visual and audio modalities. A Tensor Fusion Network (TFN) is employed to explicitly model unimodal and bimodal interactions, while ViViT and Audio Spectrogram Transformer (AST) are adopted for video and audio feature extraction, respectively. Contribution/Results: Our key innovation lies in interpretable modeling of dynamic cross-modal relationships, departing from conventional black-box fusion paradigms. Evaluated on a proprietary dataset, the system achieves 89.5% accuracy, 87.2% precision, and 84.0% recall, with a 7.0% absolute improvement in F1-score and an 8.8% gain in recall over baseline methods. These results significantly enhance operational fairness and safety in public transit systems.

Technology Category

Application Category

📝 Abstract
This research introduces a multimodal system designed to detect fraud and fare evasion in public transportation by analyzing closed circuit television (CCTV) and audio data. The proposed solution uses the Vision Transformer for Video (ViViT) model for video feature extraction and the Audio Spectrogram Transformer (AST) for audio analysis. The system implements a Tensor Fusion Network (TFN) architecture that explicitly models unimodal and bimodal interactions through a 2-fold Cartesian product. This advanced fusion technique captures complex cross-modal dynamics between visual behaviors (e.g., tailgating,unauthorized access) and audio cues (e.g., fare transaction sounds). The system was trained and tested on a custom dataset, achieving an accuracy of 89.5%, precision of 87.2%, and recall of 84.0% in detecting fraudulent activities, significantly outperforming early fusion baselines and exceeding the 75% recall rates typically reported in state-of-the-art transportation fraud detection systems. Our ablation studies demonstrate that the tensor fusion approach provides a 7.0% improvement in the F1 score and an 8.8% boost in recall compared to traditional concatenation methods. The solution supports real-time detection, enabling public transport operators to reduce revenue loss, improve passenger safety, and ensure operational compliance.
Problem

Research questions and friction points this paper is trying to address.

Detecting fare evasion and fraud in public transit systems
Analyzing CCTV video and audio data for fraudulent activities
Enabling real-time multimodal detection to reduce revenue loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Transformer for video feature extraction
Employs Audio Spectrogram Transformer for audio analysis
Implements Tensor Fusion Network for multimodal interaction modeling
🔎 Similar Papers
No similar papers found.
P
Peter Wauyo
Carnegie Mellon University Africa, Kigali, Rwanda
D
Dalia Bwiza
Carnegie Mellon University Africa, Kigali, Rwanda
A
Alain Murara
Rwanda Utilities Regulatory Authority
Edwin Mugume
Edwin Mugume
Carnegie Mellon University Africa
Heterogeneous mobile cellular networksenergy efficiencymachine learning techniques in mobile wireless networksInternet of Things
E
Eric Umuhoza
Carnegie Mellon University Africa, Kigali, Rwanda