🤖 AI Summary
General-purpose large language models (LLMs) suffer from low candidate–job matching accuracy and weak semantic understanding in recruitment automation. Method: This paper proposes a domain-adaptation approach for recruitment tasks, featuring a standardized JSON-based synthetic data generation framework integrated with real-world resume data parsed by DeepSeek to enhance data diversity and structural consistency; supervised fine-tuning of lightweight LLMs (e.g., Phi-4) on this dataset; and a multi-metric evaluation framework (F1, BLEU, ROUGE) to optimize information extraction and semantic matching performance. Contribution/Results: Experimental results show the fine-tuned Phi-4 achieves an F1 score of 90.62%, significantly outperforming baseline and state-of-the-art models in precision, recall, and semantic similarity—demonstrating the effectiveness and practicality of synergistically constructing domain-specific data and fine-tuning compact LLMs.
📝 Abstract
This paper presents a novel approach to recruitment automation. Large Language Models (LLMs) were fine-tuned to improve accuracy and efficiency. Building upon our previous work on the Multilayer Large Language Model-Based Robotic Process Automation Applicant Tracking (MLAR) system . This work introduces a novel methodology. Training fine-tuned LLMs specifically tuned for recruitment tasks. The proposed framework addresses the limitations of generic LLMs by creating a synthetic dataset that uses a standardized JSON format. This helps ensure consistency and scalability. In addition to the synthetic data set, the resumes were parsed using DeepSeek, a high-parameter LLM. The resumes were parsed into the same structured JSON format and placed in the training set. This will help improve data diversity and realism. Through experimentation, we demonstrate significant improvements in performance metrics, such as exact match, F1 score, BLEU score, ROUGE score, and overall similarity compared to base models and other state-of-the-art LLMs. In particular, the fine-tuned Phi-4 model achieved the highest F1 score of 90.62%, indicating exceptional precision and recall in recruitment tasks. This study highlights the potential of fine-tuned LLMs. Furthermore, it will revolutionize recruitment workflows by providing more accurate candidate-job matching.