Scopus Indexed Publications

Paper Details


Title
Optimizing coastal groundwater quality predictions: A novel data mining framework with cross-validation, bootstrapping, and entropy analysis
Author
Abu Reza Md Towfiqul Islam,
Email
Abstract

Investigating the potential of novel data mining algorithms (DMAs) for modeling groundwater quality in coastal areas is an important requirement for groundwater resource management, especially in the coastal region of Bangladesh where groundwater is highly contaminated. In this work, the applicability of DMA, including Gaussian Process Regression (GPR), Bayesian Ridge Regression (BRR) and Artificial Neural Network (ANN), for predicting groundwater quality in coastal areas was investigated. The optuna-based optimized hyperparameter is proposed to improve the accuracy of the models, including optuna-GPR and optuna-BRR as benchmark models. Combined cross-validation (CV) and bootstrapping (B) methods were used to build six predictive models. The entropy-based coastal groundwater quality index (ECWQI) was converted into a normalized index (ECWQIn), which was divided into five classes from very poor to excellent. The self-organizing map (SOM), spatial autocorrelation and fuzzy logic model were used to identify spatial groundwater quality patterns based on 12 physicochemical variables collected from 67 groundwater wells. The SOM analysis identified four distinct spatial patterns, including EC-TDS-Cl, Mgsingle bondpH, Ca2+single bondK+single bondNO₃, and HCO₃single bondSO₄2−single bondNa+single bondF. The results showed that both the ANN (CV) and ANN (B) models performed better than other optuna-based models during the test phase (RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21 and CC = 0.986) and (RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119 and CC = 0.975), respectively. SO42−, Cl and F played an important role in the prediction accuracy. F- and SO42− showed higher spatial autocorrelation, which affected groundwater quality degradation. In addition, the ANN (CV) and ANN (B) models showed a Gaussian distribution of model errors (small standard error, <1 %), indicating the stability of the model. These results indicate the efficiency of the ANN model in predicting groundwater quality in coastal areas, which would help regional water managers in real-time monitoring and management of sustainable groundwater resources.

Keywords
Journal or Conference Name
Journal of Contaminant Hydrology
Publication Year
2025
Indexing
scopus