Multimodal Behavioral Patterns Analysis with Eye-Tracking and LLM-Based Reasoning

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Eye-tracking data exhibits non-linguistic characteristics, strong temporal dependencies, and structural complexity, rendering conventional methods and standalone large language models (LLMs) inadequate for modeling its underlying cognitive semantics. To address this, we propose a multimodal human–AI collaborative framework: first, spatial–temporal decoupling via horizontal–vertical segmentation; second, joint modeling using LSTM for temporal dynamics and LLMs for semantic reasoning, augmented by an expert system for consensus scoring and confidence calibration; third, a hybrid LSTM–LLM anomaly detection module enabling interpretable behavioral pattern mining. Our framework significantly improves model consistency and interpretability, achieving 50% accuracy on difficulty prediction—a cognitively grounded task. It establishes a novel paradigm for cognitive modeling of non-linguistic sequential data.

Technology Category

Application Category

📝 Abstract

Eye-tracking data reveals valuable insights into users' cognitive states but is difficult to analyze due to its structured, non-linguistic nature. While large language models (LLMs) excel at reasoning over text, they struggle with temporal and numerical data. This paper presents a multimodal human-AI collaborative framework designed to enhance cognitive pattern extraction from eye-tracking signals. The framework includes: (1) a multi-stage pipeline using horizontal and vertical segmentation alongside LLM reasoning to uncover latent gaze patterns; (2) an Expert-Model Co-Scoring Module that integrates expert judgment with LLM output to generate trust scores for behavioral interpretations; and (3) a hybrid anomaly detection module combining LSTM-based temporal modeling with LLM-driven semantic analysis. Our results across several LLMs and prompt strategies show improvements in consistency, interpretability, and performance, with up to 50% accuracy in difficulty prediction tasks. This approach offers a scalable, interpretable solution for cognitive modeling and has broad potential in adaptive learning, human-computer interaction, and educational analytics.

Problem

Research questions and friction points this paper is trying to address.

Analyzing eye-tracking data for cognitive state insights

Enhancing LLM reasoning with temporal and numerical data

Improving interpretability of behavioral pattern extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal framework combines eye-tracking and LLM reasoning

Expert-Model Co-Scoring integrates human and AI judgments

Hybrid anomaly detection merges LSTM and LLM analysis

🔎 Similar Papers

No similar papers found.

Authors to Follow