🤖 AI Summary
Existing depression detection systems require prolonged data collection, hindering timely early screening. To address this challenge—particularly in resource-constrained settings—we propose a minimalist, rapid depression screening system tailored for students. The system leverages a smartphone application to capture one second of behavioral data reflecting the past seven days. It employs multi-strategy feature selection (including Boruta) and LightGBM modeling, augmented with SHAP-based interpretability analysis to identify salient depression-associated behavioral markers. The resulting lightweight model achieves a balanced accuracy of 77.9% using only a small set of stable features; the base LightGBM model attains an F1-score of 78.5% and sensitivity of 82.4%. This approach significantly improves screening speed and deployability while ensuring transparency and low infrastructure requirements—enabling efficient, interpretable, and accessible early intervention for at-risk student populations.
📝 Abstract
Background: Existing robust, pervasive device-based systems developed in recent years to detect depression require data collected over a long period and may not be effective in cases where early detection is crucial.
Objective: Our main objective was to develop a minimalistic system to identify depression using data retrieved in the fastest possible time.
Methods: We developed a fast tool that retrieves the past 7 days' app usage data in 1 second (mean 0.31, SD 1.10 seconds). A total of 100 students from Bangladesh participated in our study, and our tool collected their app usage data. To identify depressed and nondepressed students, we developed a diverse set of ML models. We selected important features using the stable approach, along with 3 main types of feature selection (FS) approaches.
Results: Leveraging only the app usage data retrieved in 1 second, our light gradient boosting machine model used the important features selected by the stable FS approach and correctly identified 82.4% (n=42) of depressed students (precision=75%, F1-score=78.5%). Moreover, after comprehensive exploration, we presented a parsimonious stacking model where around 5 features selected by the all-relevant FS approach Boruta were used in each iteration of validation and showed a maximum precision of 77.4% (balanced accuracy=77.9%). A SHAP analysis of our best models presented behavioral markers that were related to depression.
Conclusions: Due to our system's fast and minimalistic nature, it may make a worthwhile contribution to identifying depression in underdeveloped and developing regions. In addition, our detailed discussion about the implication of our findings can facilitate the development of less resource-intensive systems to better understand students who are depressed.