Enterprise Profit Prediction Using Multiple Data Sources with Missing Values through Vertical Federated Learning

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of data silos and multi-source missing values in profit forecasting for small and medium-sized enterprises (SMEs), this paper proposes a joint modeling framework under vertical federated learning. We introduce Vertical Federated Expectation Maximization (VFEM), a novel algorithm that integrates the Expectation-Maximization (EM) method—robust to complex missing-data patterns—into the vertical federated setting, enabling cross-institutional collaboration without raw data sharing. We theoretically establish its linear convergence and develop an interpretable statistical inference framework. By unifying distributed optimization with missing-data imputation techniques, VFEM supports joint analysis of heterogeneous, incomplete, multi-party data. Experiments on synthetic and real-world datasets demonstrate significant improvements in prediction accuracy, effectively resolving both data isolation and missing-value challenges.

Technology Category

Application Category

📝 Abstract
Small and medium-sized enterprises (SMEs) play a crucial role in driving economic growth. Monitoring their financial performance and discovering relevant covariates are essential for risk assessment, business planning, and policy formulation. This paper focuses on predicting profits for SMEs. Two major challenges are faced in this study: 1) SMEs data are stored across different institutions, and centralized analysis is restricted due to data security concerns; 2) data from various institutions contain different levels of missing values, resulting in a complex missingness issue. To tackle these issues, we introduce an innovative approach named Vertical Federated Expectation Maximization (VFEM), designed for federated learning under a missing data scenario. We embed a new EM algorithm into VFEM to address complex missing patterns when full dataset access is unfeasible. Furthermore, we establish the linear convergence rate for the VFEM and establish a statistical inference framework, enabling covariates to influence assessment and enhancing model interpretability. Extensive simulation studies are conducted to validate its finite sample performance. Finally, we thoroughly investigate a real-life profit prediction problem for SMEs using VFEM. Our findings demonstrate that VFEM provides a promising solution for addressing data isolation and missing values, ultimately improving the understanding of SMEs' financial performance.
Problem

Research questions and friction points this paper is trying to address.

Predicting SME profits using vertically federated learning
Addressing data isolation across institutions with privacy concerns
Handling complex missing value patterns in distributed datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vertical Federated Learning handles multi-source data isolation
Embedded EM algorithm addresses complex missing value patterns
Statistical inference framework enhances model interpretability
🔎 Similar Papers
No similar papers found.
Huiyun Tang
Huiyun Tang
University of Luxembourg
Human-computer interactionmisinformation
F
Feifei Wang
Center for Applied Statistics, Renmin University of China, Beijing, China
Long Feng
Long Feng
Professor of Nankai University
High Dimensional DataHigh Frequency Data
Y
Yang Li
Center for Applied Statistics, Renmin University of China, Beijing, China