Towards fairer public transit: Real-time tensor-based multimodal fare evasion and fraud detection

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge of detecting fare evasion and fraudulent behaviors in public transportation, specifically targeting tailgating, unauthorized entry, and ticketing anomalies. Method: We propose a real-time multimodal detection framework integrating visual and audio modalities. A Tensor Fusion Network (TFN) is employed to explicitly model unimodal and bimodal interactions, while ViViT and Audio Spectrogram Transformer (AST) are adopted for video and audio feature extraction, respectively. Contribution/Results: Our key innovation lies in interpretable modeling of dynamic cross-modal relationships, departing from conventional black-box fusion paradigms. Evaluated on a proprietary dataset, the system achieves 89.5% accuracy, 87.2% precision, and 84.0% recall, with a 7.0% absolute improvement in F1-score and an 8.8% gain in recall over baseline methods. These results significantly enhance operational fairness and safety in public transit systems.

Technology Category

Application Category

📝 Abstract

This research introduces a multimodal system designed to detect fraud and fare evasion in public transportation by analyzing closed circuit television (CCTV) and audio data. The proposed solution uses the Vision Transformer for Video (ViViT) model for video feature extraction and the Audio Spectrogram Transformer (AST) for audio analysis. The system implements a Tensor Fusion Network (TFN) architecture that explicitly models unimodal and bimodal interactions through a 2-fold Cartesian product. This advanced fusion technique captures complex cross-modal dynamics between visual behaviors (e.g., tailgating,unauthorized access) and audio cues (e.g., fare transaction sounds). The system was trained and tested on a custom dataset, achieving an accuracy of 89.5%, precision of 87.2%, and recall of 84.0% in detecting fraudulent activities, significantly outperforming early fusion baselines and exceeding the 75% recall rates typically reported in state-of-the-art transportation fraud detection systems. Our ablation studies demonstrate that the tensor fusion approach provides a 7.0% improvement in the F1 score and an 8.8% boost in recall compared to traditional concatenation methods. The solution supports real-time detection, enabling public transport operators to reduce revenue loss, improve passenger safety, and ensure operational compliance.

Problem

Research questions and friction points this paper is trying to address.

Detecting fare evasion and fraud in public transit systems

Analyzing CCTV video and audio data for fraudulent activities

Enabling real-time multimodal detection to reduce revenue loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Transformer for video feature extraction

Employs Audio Spectrogram Transformer for audio analysis

Implements Tensor Fusion Network for multimodal interaction modeling

🔎 Similar Papers

No similar papers found.