🤖 AI Summary
This work addresses the limited ability of current AI agents to autonomously recover from API call failures due to the absence of actionable repair guidance. The authors propose an introspective API mechanism that, upon validation failure, returns structured, machine-readable suggestions enabling agents to correct and retry requests without additional reasoning. By replacing natural language error messages with structured JSON feedback, the approach facilitates direct agent adaptation. Through adversarial task evaluations, answer-leakage auditing tools, and multi-model comparative experiments, the study uncovers two subtle forms of answer leakage prevalent in contemporary LLM benchmarks. Evaluated on Anthropic models, the method improves task completion rates by 36.7–40.0 percentage points and enhances token efficiency per successful task by 1.8–2.2×, with results validated in real-world billed API scenarios.
📝 Abstract
When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.