π€ AI Summary
Existing legal AI systems predominantly rely on large language models (LLMs) for superficial textual analysis, failing to jointly ensure formal rationality (rule consistency) and substantive rationality (outcome fairness), thus falling short of jurisprudential requirements for trustworthy judicial decision-making.
Method: We propose L4M, the first framework integrating adversarial LLM agents with an SMT solver: it employs role-isolated dual-agent fact extraction, UNSAT-core-guided iterative self-correction, domain-aware prompt-driven automatic formalization of legal statutes, and aligned prosecutor/defense and judge LLMsβforming an end-to-end symbolically augmented reasoning pipeline.
Contribution/Results: On public benchmarks, L4M significantly outperforms GPT-4o-mini, DeepSeek-V3, Claude 4, and state-of-the-art Legal AI methods, generating verifiable, highly interpretable judgments and optimized sentencing recommendations.
π Abstract
The rationality of law manifests in two forms: substantive rationality, which concerns the fairness or moral desirability of outcomes, and formal rationality, which requires legal decisions to follow explicitly stated, general, and logically coherent rules. Existing LLM-based systems excel at surface-level text analysis but lack the guarantees required for principled jurisprudence. We introduce L4M, a novel framework that combines adversarial LLM agents with SMT-solver-backed proofs to unite the interpretive flexibility of natural language with the rigor of symbolic verification. The pipeline consists of three phases: (1) Statute Formalization, where domain-specific prompts convert legal provisions into logical formulae; (2) Dual Fact and Statute Extraction, in which prosecutor- and defense-aligned LLMs independently map case narratives to fact tuples and statutes, ensuring role isolation; and (3) Solver-Centric Adjudication, where an autoformalizer compiles both parties' arguments into logic constraints, and unsat cores trigger iterative self-critique until a satisfiable formula is achieved, which is then verbalized by a Judge-LLM into a transparent verdict and optimized sentence. Experimental results on public benchmarks show that our system surpasses advanced LLMs including GPT-o4-mini, DeepSeek-V3, and Claude 4 as well as state-of-the-art Legal AI baselines, while providing rigorous and explainable symbolic justifications.