RTI-Bench: A Structured Dataset for Indian Right-to-Information Decision Analysis

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the opacity of administrative language in rulings by India’s Central Information Commission (CIC) and the public’s inability to reliably predict outcomes of Right to Information (RTI) appeals. To bridge this gap, the authors introduce RTI-Bench, the first structured dataset for RTI decisions, comprising 1,516 cases annotated with verdicts, exemption clauses, IRAC-based legal reasoning structures, and procedural timelines, synthesized from publicly available sources and CIC PDF documents. The work innovatively applies the IRAC framework to legal annotation across multiple commissioner tenures and evolving document formats. Combining rule-based information extraction with manual verification, the pipeline achieves 89% label coverage on the instruction-response subset and 51% on the main verdict subset, with a human-audited precision of 95.3%. In zero-shot evaluation, Mistral 7B attains 57.3% accuracy and 37.0% macro F1 on outcome prediction, significantly outperforming baseline models.

📝 Abstract

India's Right to Information Act, 2005 gives every citizen the right to demand information from public authorities, yet in practice most people cannot make sense of the dense administrative language used in Central Information Commission (CIC) decisions, let alone predict whether an appeal is worth filing. This paper introduces RTI-Bench, a structured dataset of CIC decisions with outcome labels, exemption citations, IRAC-style reasoning components, and procedural timelines. To the best of our knowledge it is the first publicly released structured dataset for Indian RTI administrative decisions. The dataset draws from two sources: 1,218 cases from a publicly available instruction-response corpus (with structured fields added through rule-based extraction), and 298 CIC decision PDFs collected directly from the Commission portal, spanning five commissioners and three document format generations from 2023 to 2026. Label coverage reaches 89% on the instruction-response corpus. For the PDF subset of 239 primary decisions, coverage is 51% in this first release. A random sample of 50 labelled cases was manually reviewed, yielding a label precision of 95.3%. A zero-shot Mistral 7B baseline on 100 cases gives 57.3% accuracy and 37.0% macro-F1 on outcome prediction, well above the majority-class baseline of 14.3% macro-F1. RTI-Bench is available at https://huggingface.co/datasets/joyboseroy/rti-bench

Problem

Research questions and friction points this paper is trying to address.

Right to Information

administrative decisions

legal language comprehension

appeal prediction

information access

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured legal dataset

Right-to-Information Act

IRAC reasoning