Initial data analysis of the national German transplantation registry with a focus on kidney transplantation

📅 2026-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses data quality challenges in the German Organ Transplantation Registry (TxReg), where missingness, inconsistencies, and ambiguity in event-time variable selection compromise research reliability. Analyzing data from 14,954 recipients and 9,964 donors between 2006 and 2016, this work systematically characterizes conflicts and complementarities among multi-source variables, identifying 168 cross-verifiable fields. By integrating missingness pattern analysis, decision tree modeling, and multi-source consistency checks, the study delineates the underlying missing data structure and proposes targeted imputation strategies. Findings reveal that while some tables exhibit missing rates exceeding 50%, key variables retain high imputation potential. Moreover, event-time analyses prove highly sensitive to variable selection, underscoring the need for careful curation. This work establishes a robust data foundation for future high-quality research leveraging TxReg.

Technology Category

Application Category

📝 Abstract
This study presents an Initial Data Analysis (IDA) of the German Transplantation Registry (TxReg) data for a better data understanding and to inform future data analyses. The IDA is focusing on data on first-time kidney-only transplantations in adult recipients from deceased donors between 2006 and 2016 and refers to data from 14,954 recipients and 9,964 donors across 25 tables. Investigated aspects include missing data patterns and structure, data consistency, and availability of event time data. Results show that missing data proportions vary widely, with some tables nearly complete while others have over 50% missing values. Missing data patterns are identified using a decision tree approach. An influx and outflux analysis demonstrates that some variables have high potential for imputing missing data, while others were less suitable for imputation. We identified 168 multi-sourced variables that are reported by multiple data providers in parallel leading to discrepancies for some variables but also providing opportunities for missing data imputation. Our findings on event time data demonstrate the importance of carefully selecting the variables used for event time analyses as results will strongly depend on this selection. In summary, our findings highlight the challenges when utilizing the TxReg data for research and provide recommendations for data preprocessing and analysis in future analyses.
Problem

Research questions and friction points this paper is trying to address.

missing data
data consistency
event time data
transplantation registry
data quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Initial Data Analysis
missing data imputation
multi-sourced variables
event time data
decision tree
🔎 Similar Papers
No similar papers found.
Lukas Klein
Lukas Klein
EPFL, USZ
Machine LearningBiotechComputer Vision
G
Gunter Grieser
Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295
C
Carl-Ludwig Fischer-Fröhlich
Deutsche Stiftung Organtransplantation, Deutschherrnufer 52, Frankfurt am Main, 60594
A
Axel Rahmel
Deutsche Stiftung Organtransplantation, Deutschherrnufer 52, Frankfurt am Main, 60594
Henrik Stahl
Henrik Stahl
Associate Professor in Marine Science at University of Khorfakkan
Marine: BiogeochemistryCarbon & Nutrient CyclingOcean AcidificationCCSCoral Reef Restoration
A
Andreas Wienke
Martin Luther University Halle-Wittenberg, Magdeburger Straße 8, Halle, 6112
A
Antje Jahn-Eimermacher
Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295