Towards Trustworthy Legal AI through LLM Agents and Formal Reasoning

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing legal AI systems predominantly rely on large language models (LLMs) for superficial textual analysis, failing to jointly ensure formal rationality (rule consistency) and substantive rationality (outcome fairness), thus falling short of jurisprudential requirements for trustworthy judicial decision-making. Method: We propose L4M, the first framework integrating adversarial LLM agents with an SMT solver: it employs role-isolated dual-agent fact extraction, UNSAT-core-guided iterative self-correction, domain-aware prompt-driven automatic formalization of legal statutes, and aligned prosecutor/defense and judge LLMs—forming an end-to-end symbolically augmented reasoning pipeline. Contribution/Results: On public benchmarks, L4M significantly outperforms GPT-4o-mini, DeepSeek-V3, Claude 4, and state-of-the-art Legal AI methods, generating verifiable, highly interpretable judgments and optimized sentencing recommendations.

Technology Category

Application Category

📝 Abstract

The rationality of law manifests in two forms: substantive rationality, which concerns the fairness or moral desirability of outcomes, and formal rationality, which requires legal decisions to follow explicitly stated, general, and logically coherent rules. Existing LLM-based systems excel at surface-level text analysis but lack the guarantees required for principled jurisprudence. We introduce L4M, a novel framework that combines adversarial LLM agents with SMT-solver-backed proofs to unite the interpretive flexibility of natural language with the rigor of symbolic verification. The pipeline consists of three phases: (1) Statute Formalization, where domain-specific prompts convert legal provisions into logical formulae; (2) Dual Fact and Statute Extraction, in which prosecutor- and defense-aligned LLMs independently map case narratives to fact tuples and statutes, ensuring role isolation; and (3) Solver-Centric Adjudication, where an autoformalizer compiles both parties' arguments into logic constraints, and unsat cores trigger iterative self-critique until a satisfiable formula is achieved, which is then verbalized by a Judge-LLM into a transparent verdict and optimized sentence. Experimental results on public benchmarks show that our system surpasses advanced LLMs including GPT-o4-mini, DeepSeek-V3, and Claude 4 as well as state-of-the-art Legal AI baselines, while providing rigorous and explainable symbolic justifications.

Problem

Research questions and friction points this paper is trying to address.

Combining LLM flexibility with symbolic verification for legal reasoning

Ensuring legal decisions follow logical rules through adversarial agents

Bridging natural language interpretation with formal proof guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines adversarial LLM agents with SMT-solver proofs

Formalizes legal provisions into logical formulae using prompts

Uses autoformalizer and iterative self-critique for adjudication

🔎 Similar Papers

No similar papers found.

Authors to Follow