Abstract:
Using the European Union Statistics on Income and Living Conditions (EU-SILC) microdata and applying machine learning (ML) algorithms, the following questions have been explored: (i) How accurately can one classify unseen individuals’ deprivations status given their observable personal, household, and country-specific factors? (ii) What is the performance of targeting subsets of features, such as sociodemographic, socioeconomic, health, and location, to identify the deprived? (iii) What are the key predictors and their partial effects? Key results of the empirical analysis demonstrate that the relative accuracy gained by using the sophisticated tree-based ML algorithm is positive and significant compared to that of the standard generalized linear model (7.3% relative gain with Xgoost and 5.9% with the random forest). Socioeconomic factors yield a classification accuracy as close as when the whole set of features is considered. Feature importance and partial effect analysis identified with Shapley’s value reveal insightful relationships consistent with theoretical and empirical evidence.
Keywords: Material and social deprivation, Machine learning classification, EU-SILC microdata, Socioeconomic determinants, Predictive modeling, Shapley value (SHAP) analysis