🤖 AI Summary
This study investigates the causal mechanisms through which multidimensional factors—sociocultural, academic, and economic—affect undergraduate students’ cumulative grade point average (CGPA). Leveraging survey data, we construct a causal graph and propose a “causality-guided, interpretable modeling” framework that integrates causal inference with eXplainable Artificial Intelligence (XAI). We employ Ridge regression (MAE = 0.12, MSE = 0.023) and random forest (classification accuracy = 98.68%, F1-score ≈ 1.0), augmented by SHAP, LIME, and InterpretML for feature attribution and model transparency. Key causal drivers identified include family educational resources, time investment in learning, and course workload. A lightweight web application is developed to deliver personalized academic assessment and actionable intervention recommendations. Our primary contribution lies in embedding causal structural priors directly into the XAI modeling pipeline—thereby jointly optimizing predictive performance and interpretability—and establishing a methodological paradigm for education data science.
📝 Abstract
Academic performance depends on a multivariable nexus of socio-academic and financial factors. This study investigates these influences to develop effective strategies for optimizing students' CGPA. To achieve this, we reviewed various literature to identify key influencing factors and constructed an initial hypothetical causal graph based on the findings. Additionally, an online survey was conducted, where 1,050 students participated, providing comprehensive data for analysis. Rigorous data preprocessing techniques, including cleaning and visualization, ensured data quality before analysis. Causal analysis validated the relationships among variables, offering deeper insights into their direct and indirect effects on CGPA. Regression models were implemented for CGPA prediction, while classification models categorized students based on performance levels. Ridge Regression demonstrated strong predictive accuracy, achieving a Mean Absolute Error of 0.12 and a Mean Squared Error of 0.023. Random Forest outperformed in classification, attaining an F1-score near perfection and an accuracy of 98.68%. Explainable AI techniques such as SHAP, LIME, and Interpret enhanced model interpretability, highlighting critical factors such as study hours, scholarships, parental education, and prior academic performance. The study culminated in the development of a web-based application that provides students with personalized insights, allowing them to predict academic performance, identify areas for improvement, and make informed decisions to enhance their outcomes.