Data-Driven Discovery of Feature Groups in Clinical Time Series

๐Ÿ“… 2025-11-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Clinical time-series data exhibit abundant heterogeneous features with limited prior semantic knowledge, hindering task-oriented feature grouping. To address this, we propose a data-driven, end-to-end feature clustering method: a supervised learning framework jointly optimizes feature embedding weights and performs differentiable clustering on the embeddings to automatically discover dynamic feature groups highly relevant to the prediction task. The approach requires no domain-specific priors and simultaneously improves model performance and clinical interpretability. On synthetic benchmarks, it significantly outperforms static clustering baselines; on real-world intensive care unit data, it achieves predictive accuracy comparable to expert-defined groupings and uncovers clinically meaningful variable associationsโ€”e.g., coordinated dynamics between blood pressure and heart rate. Our key innovation lies in integrating embedding-weight clustering as a fully differentiable module within the supervised training pipeline, enabling task-aware discovery of latent feature structure.

Technology Category

Application Category

๐Ÿ“ Abstract
Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.
Problem

Research questions and friction points this paper is trying to address.

Automatically grouping clinical time series features to improve prediction performance
Overcoming challenges in manual feature grouping using semantic knowledge alone
Learning clinically interpretable feature groups through weight clustering in embedding layers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns feature groups by clustering embedding weights
Integrates into standard supervised training process
Produces clinically interpretable data-driven feature relationships
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Fedor Sergeev
Department of Computer Science, ETH Zurich, Switzerland
M
Manuel Burger
Department of Computer Science, ETH Zurich, Switzerland
P
Polina Leshetkina
Department of Health Science and Medicine, University of Lucerne, Switzerland
Vincent Fortuin
Vincent Fortuin
Principal Investigator, Helmholtz AI & TU Munich
Bayesian deep learningDeep generative AIPAC-Bayes
G
G. Ratsch
Department of Computer Science, ETH Zurich, Switzerland
Rita Kuznetsova
Rita Kuznetsova
Department of Computer Science, ETH Zurich, Switzerland