Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

📅 2025-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Football match data suffers from poor interoperability and high analytical overhead due to multi-source heterogeneity—divergent acquisition dimensions, semantic definitions, representation schemes, and delivery protocols. To address this, we propose the Common Data Format (CDF), a universal standard format for holistic match data. CDF introduces the first minimal yet complete standardized architecture covering five core data types: match metadata, events, tracking data, video annotations, and auxiliary metadata—emphasizing traceability, contextual completeness, and downstream task readiness. Structurally defined via JSON Schema, CDF incorporates semantic naming conventions, a unified pitch coordinate system, provenance annotation, and versioned delivery protocols. The released CDF 1.0 technical specification enables plug-and-play cross-platform integration, substantially reducing data processing costs and integration timelines for clubs and national football associations. By establishing a foundational interoperability layer, CDF catalyzes a paradigm shift toward industry-wide collaborative data ecosystems.

Technology Category

Application Category

📝 Abstract
During football matches, a variety of different parties (e.g., companies) each collect (possibly overlapping) data about the match ranging from basic information (e.g., starting players) to detailed positional data. This data is provided to clubs, federations, and other organizations who are increasingly interested in leveraging this data to inform their decision making. Unfortunately, analyzing such data pose significant barriers because each provider may (1) collect different data, (2) use different specifications even within the same category of data, (3) represent the data differently, and (4) delivers the data in a different manner (e.g., file format, protocol). Consequently, working with these data requires a significant investment of time and money. The goal of this work is to propose a uniform and standardized format for football data called the Common Data Format (CDF). The CDF specifies a minimal schema for five types of match data: match sheet data, video footage, event data, tracking data, and match meta data. It aims to ensure that the provided data is clear, sufficiently contextualized (e.g., its provenance is clear), and complete such that it enables common downstream analysis tasks. Concretely, this paper will detail the technical specifications of the CDF, the representational choices that were made to help ensure the clarity of the provided data, and a concrete approach for delivering data in the CDF.
Problem

Research questions and friction points this paper is trying to address.

Standardizing diverse football match data formats
Reducing barriers to analyzing multi-source match data
Creating unified schema for football data interoperability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized format for diverse football match data
Minimal schema for five key data types
Ensures clarity, context, and completeness for analysis
🔎 Similar Papers
No similar papers found.
G
Gabriel Anzer
RB Leipzig, Leipzig, Germany
K
Kilian Arnsmeyer
Deutscher Fußball-Bund (DFB), Frankfurt, Germany
P
Pascal Bauer
Deutscher Fußball-Bund (DFB), Frankfurt, Germany; Saarland University, Saarbrücken, Germany
Joris Bekkers
Joris Bekkers
UnravelSports | U.S. Soccer Federation | PySport
SportsSoccerFootballOpen Source
Ulf Brefeld
Ulf Brefeld
Leuphana Universität Lüneburg
Machine Learning
Jesse Davis
Jesse Davis
Professor, Department of Computer Science, KU Leuven
Machine learningArtificial intelligenceSports analyticsData miningMedical informatics
N
Nicolas Evans
FIFA, Zurich, Switzerland
M
Matthias Kempe
University of Groningen, Groningen, Netherlands
S
Samuel J Robertson
FIFA, Zurich, Switzerland
Joshua Wyatt Smith
Joshua Wyatt Smith
Wyatt AI Inc., Montreal, Canada; Concordia University, Montreal, Canada
Jan Van Haaren
Jan Van Haaren
Club Brugge and KU Leuven
Machine LearningArtificial IntelligenceSports Analytics