Synthetic CVs To Build and Test Fairness-Aware Hiring Tools

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Algorithmic hiring systems often introduce demographic biases—such as those based on age, gender, or nationality—during resume retrieval and ranking; however, publicly available resume datasets with controllable demographic attributes remain scarce, hindering fair evaluation, bias mitigation, and reproducible research. To address this gap, we propose a synthetic resume generation method grounded in real-world data donation, which preserves authentic linguistic patterns, structural conventions, and multivariate demographic distributions observed in actual resumes. The resulting high-quality, diverse dataset comprises 1,730 synthetic CVs annotated with fine-grained, intervenable demographic attributes. To our knowledge, this is the first publicly available resume dataset enabling granular, controllable bias analysis and empirical validation of fairness-enhancing techniques. It fills a critical benchmarking void in algorithmic hiring fairness research and establishes foundational infrastructure for standardized, reproducible fairness evaluation frameworks.

Technology Category

Application Category

📝 Abstract
Algorithmic hiring has become increasingly necessary in some sectors as it promises to deal with hundreds or even thousands of applicants. At the heart of these systems are algorithms designed to retrieve and rank candidate profiles, which are usually represented by Curricula Vitae (CVs). Research has shown, however, that such technologies can inadvertently introduce bias, leading to discrimination based on factors such as candidates' age, gender, or national origin. Developing methods to measure, mitigate, and explain bias in algorithmic hiring, as well as to evaluate and compare fairness techniques before deployment, requires sets of CVs that reflect the characteristics of people from diverse backgrounds. However, datasets of these characteristics that can be used to conduct this research do not exist. To address this limitation, this paper introduces an approach for building a synthetic dataset of CVs with features modeled on real materials collected through a data donation campaign. Additionally, the resulting dataset of 1,730 CVs is presented, which we envision as a potential benchmarking standard for research on algorithmic hiring discrimination.
Problem

Research questions and friction points this paper is trying to address.

Developing methods to measure and mitigate bias in algorithmic hiring
Creating synthetic CV datasets to test fairness-aware hiring tools
Addressing lack of diverse CV datasets for bias evaluation research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic CV dataset creation
Modeled features from real data
Benchmarking standard for fairness
🔎 Similar Papers
No similar papers found.