JobHop: A Large-Scale Dataset of Career Trajectories

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of high-quality occupational trajectory data for labor market dynamics analysis. Leveraging 2.3 million job histories and 391,000 anonymized resumes from the Flemish Public Employment Service (VDAB), we construct the largest structured occupational trajectory dataset to date. We propose a novel LLM-driven multi-label classification framework that enables high-accuracy, reusable mapping from unstructured resume text to standardized ESCO occupation codes—first of its kind—integrated with privacy-preserving preprocessing and semantic standardization alignment. The dataset is publicly released. Empirical analyses reveal three key findings: (1) occupational transitions exhibit strong path dependence; (2) career interruptions significantly reduce subsequent job stability; and (3) cross-domain occupational mobility concentrates in skill-adjacent occupations. This work provides both a robust empirical foundation and a methodological paradigm for labor market policy design, career counseling, and workforce forecasting.

Technology Category

Application Category

📝 Abstract
Understanding labor market dynamics is essential for policymakers, employers, and job seekers. However, comprehensive datasets that capture real-world career trajectories are scarce. In this paper, we introduce JobHop, a large-scale public dataset derived from anonymized resumes provided by VDAB, the public employment service in Flanders, Belgium. Utilizing Large Language Models (LLMs), we process unstructured resume data to extract structured career information, which is then mapped to standardized ESCO occupation codes using a multi-label classification model. This results in a rich dataset of over 2.3 million work experiences, extracted from and grouped into more than 391,000 user resumes and mapped to standardized ESCO occupation codes, offering valuable insights into real-world occupational transitions. This dataset enables diverse applications, such as analyzing labor market mobility, job stability, and the effects of career breaks on occupational transitions. It also supports career path prediction and other data-driven decision-making processes. To illustrate its potential, we explore key dataset characteristics, including job distributions, career breaks, and job transitions, demonstrating its value for advancing labor market research.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive datasets on real-world career trajectories
Need for structured career data from unstructured resumes
Limited insights into labor market mobility and job transitions
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs process unstructured resume data
Multi-label classification maps to ESCO codes
Dataset enables labor market mobility analysis
🔎 Similar Papers
No similar papers found.