Prediction of academic performance applying NNs: A focus on statistical feature-shedding and lifestyle
Abstract: Automation has made it possible to garner and preserve students’ data and the modern advent in data science enthusiastically mines this data to predict performance, to the interest of both tutors and tutees. Academic excellence is a phenomenon resulting from a complex set of criteria originating in psychology, habits and according to this study, lifestyle and preferences–justifying machine learning to be ideal in classifying academic soundness. In this paper, computer science majors’ data have been gleaned consensually by surveying at Ahsanullah University, situated in Bangladesh. Visually aided exploratory analysis revealed interesting propensities as features, whose significance was further substantiated by statistically inferential Chi-squared (Χ2) independence tests and independent samples t-tests for categorical and continuous variables respectively, on median/mode-imputed data. The initially relaxed p-value retained all exploratorily analyzed features, but gradual rigidification exposed the most powerful features by fitting neural networks of decreasing complexity i.e., having 24, 20 and finally 12 hidden neurons. Statistical inference uniquely helped shed off weak features prior to training, thus optimizing time and generally large computational power to train expensive predictive models. The k-fold cross-validated, hyper-parametrically tuned, robust models performed with average accuracies wavering between 90% to 96% and an average 89.21% F1-score on the optimal model, with the incremental improvement in models proven by statistical ANOVA.
Educational Data Mining (EDM); Exploratory Data Analysis (EDA); median and mode imputation; inferential statistics; t-test; Chi-squared independence test; ANOVA-test