Transfer Learning for Robust Structured Regression with Bi-level Source Detection

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the performance degradation in cross-domain transfer learning caused by contaminated data in both source and target domains, particularly in high-dimensional structured regression settings. To tackle this challenge, the authors propose TransL2E, a novel method that integrates the robust L2E estimation criterion into a structured regression framework. TransL2E introduces an innovative data-driven, two-level (individual and group) source domain detection mechanism to effectively identify and leverage reliable source information. By synergistically combining robust estimation with transfer learning, the method substantially enhances the robustness of parameter estimation and the accuracy of structural recovery under data contamination and scarcity. Empirical evaluations demonstrate that TransL2E consistently outperforms existing approaches on both synthetic datasets and real-world analyses of COVID-19 mortality rates.

Technology Category

Application Category

📝 Abstract

High-dimensional data in modern applications, such as COVID-19 mortality, often span multiple domains. Leveraging auxiliary information from source domains to improve performance in a target domain motivates the use of transfer learning. However, a practical issue that has been overlooked is data contamination, which induces heterogeneity and can significantly degrade transfer learning performance. To address this challenge, we propose a novel approach that tackles transfer learning under data contamination within a structured regression setting. By employing the robust L2E criterion, we develop the TransL2E method that accounts for contamination in both target and source data while effectively transferring relevant information. Beyond robust estimation, TransL2E introduces a data-driven bi-level source detection mechanism, operating at both individual and cohort levels, which possesses multiple advantages over existing source detection approaches. Comprehensive simulation studies and a real data application demonstrate the superior performance of TransL2E in both robust estimation and structure recovery in the presence of data limitation and contamination.

Problem

Research questions and friction points this paper is trying to address.

transfer learning

data contamination

structured regression

high-dimensional data

domain heterogeneity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer Learning

Robust Regression

Data Contamination