Performance Review on LLM for solving leetcode problems

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study systematically evaluates the code generation capabilities of mainstream large language models (LLMs) on LeetCode programming problems, focusing on fundamental limitations in algorithmic reasoning, edge-case handling, and time-complexity optimization. Method: Leveraging a crawled real-world problem corpus, we invoke APIs of models including GPT-4 and GPT-3.5-turbo, and perform automated execution and testing. Crucially, we introduce the first joint evaluation framework combining pass@k accuracy with empirical runtime measurements. Contribution/Results: Results reveal that GPT-4 achieves 62.3% pass@1 on medium-difficulty problems but drops sharply to 28.1% on hard ones. Over 40% of failures stem from logical flaws—not syntactic errors—and 73% of generated solutions fail to achieve optimal asymptotic time complexity. These findings expose structural deficiencies in LLMs’ algorithmic reasoning, establishing an empirically grounded evaluation paradigm for programming-assistant tools and informing concrete directions for model improvement.

Technology Category

Application Category

📝 Abstract

This paper presents a comprehensive performance evaluation of Large Language Models (LLMs) in solving programming challenges from Leetcode, a widely used platform for algorithm practice and technical interviews. We began by crawling the Leetcode website to collect a diverse set of problems encompassing various difficulty levels and topics. Using this dataset, we generated solutions with multiple LLMs, including GPT-4 and GPT-3.5-turbo (ChatGPT-turbo). The generated solutions were systematically evaluated for correctness and efficiency. We employed the pass@k metric to assess the success rates within a given number of attempts and analyzed the runtime performance of the solutions. Our results highlight the strengths and limitations of current LLMs [10] in code generation and problem-solving tasks, providing insights into their potential applications and areas for improvement in automated programming assistance.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on Leetcode problems

Assessing solution correctness and efficiency

Identifying LLM strengths and limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs evaluate Leetcode problems

Pass@k metric for success rates

Analyze runtime performance of code

🔎 Similar Papers

No similar papers found.