"You Are Rejected!": An Empirical Study of Large Language Models Taking Hiring Evaluations

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study empirically evaluates large language models (LLMs) against industry-standard technical hiring assessments for algorithm and software engineering roles. Method: We administered realistic, industrial-grade programming, system design, and reasoning questions—commonly used by leading technology firms—to state-of-the-art LLMs (e.g., GPT-4, Claude 3, Gemini) and conducted multi-stage comparative analysis against official corporate reference solutions, assessing correctness, completeness, engineering soundness, and consistency. Contribution/Results: Our analysis reveals systematic structural gaps between LLM outputs and industrial expectations: no tested model met enterprise hiring thresholds. Critical deficiencies were observed in boundary-case handling, explicit modeling of resource constraints (e.g., time/space complexity, scalability), and maintainability-aware design. These findings challenge the prevailing assumption that LLMs can directly substitute for entry-level engineers. Moreover, this work introduces the first benchmark framework specifically tailored to industrial recruitment scenarios, providing empirically grounded insights for AI capability evaluation in real-world engineering hiring.

Technology Category

Application Category

📝 Abstract
With the proliferation of the internet and the rapid advancement of Artificial Intelligence, leading technology companies face an urgent annual demand for a considerable number of software and algorithm engineers. To efficiently and effectively identify high-potential candidates from thousands of applicants, these firms have established a multi-stage selection process, which crucially includes a standardized hiring evaluation designed to assess job-specific competencies. Motivated by the demonstrated prowess of Large Language Models (LLMs) in coding and reasoning tasks, this paper investigates a critical question: Can LLMs successfully pass these hiring evaluations? To this end, we conduct a comprehensive examination of a widely used professional assessment questionnaire. We employ state-of-the-art LLMs to generate responses and subsequently evaluate their performance. Contrary to any prior expectation of LLMs being ideal engineers, our analysis reveals a significant inconsistency between the model-generated answers and the company-referenced solutions. Our empirical findings lead to a striking conclusion: All evaluated LLMs fails to pass the hiring evaluation.
Problem

Research questions and friction points this paper is trying to address.

Investigating LLMs' ability to pass hiring evaluations
Assessing performance consistency between models and company standards
Revealing LLMs' failure in professional competency assessments
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate responses to hiring assessments
Models evaluated against company-referenced solutions
Study reveals LLMs fail hiring evaluations
🔎 Similar Papers
No similar papers found.