TOC-UCO: a comprehensive repository of tabular ordinal classification datasets

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Ordered classification (OC) has long suffered from a lack of standardized, reproducible benchmark datasets. To address this, we introduce TOC-UCO—the first comprehensive, open-source dataset repository specifically designed for tabular ordered classification tasks. It comprises 46 uniformly preprocessed datasets, carefully curated to ensure both adequate sample sizes and realistic ordinal class distributions. To enhance experimental reproducibility and evaluation fairness, TOC-UCO provides 30 distinct randomized train/test splits for each dataset. Furthermore, we release a transparent, open-source preprocessing framework—including full documentation of data provenance, cleaning procedures, and standardized benchmarking protocols. TOC-UCO is publicly available and serves as a robust validation platform for novel OC algorithms, thereby advancing systematic, standardized research in ordered classification.

Technology Category

Application Category

📝 Abstract
An ordinal classification (OC) problem corresponds to a special type of classification characterised by the presence of a natural order relationship among the classes. This type of problem can be found in a number of real-world applications, motivating the design and development of many ordinal methodologies over the last years. However, it is important to highlight that the development of the OC field suffers from one main disadvantage: the lack of a comprehensive set of datasets on which novel approaches to the literature have to be benchmarked. In order to approach this objective, this manuscript from the University of Córdoba (UCO), which have previous experience on the OC field, provides the literature with a publicly available repository of tabular data for a robust validation of novel OC approaches, namely TOC-UCO (Tabular Ordinal Classification repository of the UCO). Specifically, this repository includes a set of $46$ tabular ordinal datasets, preprocessed under a common framework and ensured to have a reasonable number of patterns and an appropriate class distribution. We also provide the sources and preprocessing steps of each dataset, along with details on how to benchmark a novel approach using the TOC-UCO repository. For this, indices for $30$ different randomised train-test partitions are provided to facilitate the reproducibility of the experiments.
Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive ordinal classification datasets for benchmarking
Need for standardized tabular datasets with natural class order
Providing preprocessed datasets and partitions for reproducible OC research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Public repository of 46 ordinal datasets
Common preprocessing framework for datasets
Includes 30 randomized train-test partitions
R
Rafael Ayllón-Gavilán
Department of Clinical-Epidemiological Research in Primary Care, Instituto Maimónides de Investigación Biomédica de Córdoba (IMIBIC), Spain; Programa de doctorado en Computación Avanzada, Energía y Plasmas, Universidad de Córdoba, Córdoba, Andalucía, Spain
David Guijo-Rubio
David Guijo-Rubio
Assistant Professor, University of Córdoba
time series machine learningordinal classification
Antonio Manuel Gómez-Orellana
Antonio Manuel Gómez-Orellana
Departamento de Ciencia de la Computación e Inteligencia Artificial, Universidad de Córdoba, Spain
F
Francisco Bérchez-Moreno
Departamento de Ciencia de la Computación e Inteligencia Artificial, Universidad de Córdoba, Spain
V
Víctor Manuel Vargas-Yun
Departamento de Ciencia de la Computación e Inteligencia Artificial, Universidad de Córdoba, Spain
P
Pedro A. Gutiérrez
Departamento de Ciencia de la Computación e Inteligencia Artificial, Universidad de Córdoba, Spain