MEC$^3$O: Multi-Expert Consensus for Code Time Complexity Prediction

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the imbalanced cross-category performance of single large language models (LLMs) in source code time-complexity prediction, this paper proposes a Multi-Expert Consensus System. We design a performance-aware expert role assignment mechanism that specializes individual LLMs for distinct complexity classes and introduce a structured multi-agent debate framework, integrating class-specific instructions and weighted ensemble strategies to mitigate reasoning degradation and prevent erroneous majority convergence—without requiring an external adjudicator model. Evaluated on the CodeComplex benchmark, our method achieves ≥10% average improvements in accuracy and macro-F1 over open-source baselines, outperforming GPT-4o-mini and matching GPT-4o’s performance. Our core contribution is the first LLM collaboration paradigm for complexity prediction that is both adjudicator-free and class-adaptive.

Technology Category

Application Category

📝 Abstract
Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning struggle with certain complexity classes. This suggests that no single LLM excels at every class, but rather each model shows advantages in certain classes. We propose MEC$^3$O, a multi-expert consensus system, which extends the multi-agent debate frameworks. MEC$^3$O assigns LLMs to complexity classes based on their performance and provides them with class-specialized instructions, turning them into experts. These experts engage in structured debates, and their predictions are integrated through a weighted consensus mechanism. Our expertise assignments to LLMs effectively handle Degeneration-of-Thought, reducing reliance on a separate judge model, and preventing convergence to incorrect majority opinions. Experiments on CodeComplex show that MEC$^3$O outperforms the open-source baselines, achieving at least 10% higher accuracy and macro-F1 scores. It also surpasses GPT-4o-mini in macro-F1 scores on average and demonstrates competitive on-par F1 scores to GPT-4o and GPT-o4-mini on average. This demonstrates the effectiveness of multi-expert debates and weight consensus strategy to generate the final predictions. Our code and data is available at https://github.com/suhanmen/MECO.
Problem

Research questions and friction points this paper is trying to address.

Predicting code time complexity for software development
Addressing LLM limitations in certain complexity classes
Integrating expert predictions through weighted consensus mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-expert consensus system assigns LLMs to complexity classes
Experts engage in structured debates with specialized instructions
Weighted consensus integrates predictions to handle Degeneration-of-Thought