Traceable by Design: An LLM Pipeline and Dashboard for EU Regulatory Consultation Analysis

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This study addresses the inefficiency of manual analysis in processing the vast volumes of unstructured textual submissions received during European Union public consultations. To overcome this challenge, the authors propose an end-to-end large language model pipeline coupled with an interactive dashboard that automatically extracts structured thematic labels from PDFs and web forms while linking each label to verbatim quotations from the original documents, ensuring full traceability. Designed around the principles of verbatim citation, end-to-end auditability, and transparency-by-design, the system transcends predefined taxonomies to surface emerging issues such as age verification and payment scrutiny. Applied to 4,322 consultation responses, it generated 15,368 thematic annotations supported by 20,951 evidence excerpts and demonstrates strong adaptability across domains. The code and data are publicly released.
📝 Abstract
Public consultations generate large volumes of data in the form of stakeholder submissions that are practically unfeasible to analyse manually. We present an end-to-end LLM-based pipeline and interactive dashboard for structured topic extraction from regulatory consultation submissions, demonstrated on the European Commission's Digital Fairness Act (DFA) public call for evidence as a case study. The system processes raw PDF attachments and web-form responses, extracts topic annotations, and grounds every extraction in a verbatim quote from the source text. Applied to 4,322 DFA submissions, the pipeline produced 15,368 topic annotations supported by 20,951 verbatim evidence quotes. Three principles govern the proposed design: verbatim grounding, full traceability, and transparency by design. The dashboard exposes the full extraction dataset through five analytical views, from dataset-level topic overviews to individual paragraph drill-downs, with every result traceable to its source. Beyond the predefined DFA topic categories, the pipeline generated certain stakeholder concerns, such as Age Verification, Payment Processor Censorship, and Digital Ownership, that a fixed-taxonomy approach would have missed. The pipeline is domain-generic; adapting it to a new consultation requires only a prompt update and a new dataset. A live demo is available at https://dfa-dashboard.thalesbertaglia.com/. The code and processed data are publicly available at https://github.com/thalesbertaglia/dfa-dashboard.
Problem

Research questions and friction points this paper is trying to address.

public consultation
regulatory analysis
topic extraction
traceability
LLM pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM pipeline
verbatim grounding
traceability
regulatory consultation analysis
interactive dashboard
🔎 Similar Papers
No similar papers found.