Araştırma Makalesi
BibTex RIS Kaynak Göster

WATER QUALITY AND POTABİLİTY PREDICTION WITH MACHINE LEARNING ALGORITHMS

Yıl 2023, Cilt: 6 Sayı: 2, 65 - 80, 30.12.2023

Öz

Drinking water is one of the basic needs of people that is vital for their survival. It is important to understand the quality and potability of this requirement, which directly affects human health. Water quality can be estimated through conventional laboratory and statistical analysis. However, this solution is generally expensive and time consuming. Water availability can be analyzed quickly and efficiently with machine learning methods, which have developed rapidly in recent years and benefit many areas of our lives. In this context, models were developed with 15 different machine learning algorithms to predict water quality and drinkability and their results were compared. In model evaluations, it was seen that LGBMClassifier and SVC algorithms provided the best prediction performance. Hyperparameter optimization was performed using the GridSearchCv object for these two models that showed the best prediction performance. After the optimization process, the LGBMClassifier model achieved the most successful prediction result with an accuracy value of 0.92%. The study will guide future studies with its analysis and visualization of the factors affecting water quality and drinkability and its high prediction performance.

Kaynakça

  • Yalçın, L., & Musa, G. Ö. K. (2013). Su Hakkının Bir Temel İnsan Hakkı Olarak Tanınma Süreci ve Türkiye’de Uygulanabilirliği. Memleket Siyaset Yönetim, 8(19-20), 25-62.
  • Yaseen, Z. M. (2021). An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: Review, challenges and solutions. Chemosphere, 277, 130126.
  • Akhtar, N., Syakir Ishak, M. I., Bhawani, S. A., & Umar, K. (2021). Various natural and anthropogenic factors responsible for water quality degradation: A review. Water, 13(19), 2660.
  • Peng, H., Yang, W., Ferrer, A. S. N., Xiong, S., Li, X., Niu, G., & Lu, T. (2022). Hydrochemical characteristics and health risk assessment of groundwater in karst areas of southwest China: A case study of Bama, Guangxi. Journal of Cleaner Production, 341, 130872.
  • Zainurin, S. N., Wan Ismail, W. Z., Mahamud, S. N. I., Ismail, I., Jamaludin, J., Ariffin, K. N. Z., & Wan Ahmad Kamil, W. M. (2022). Advancements in monitoring water quality based on various sensing methods: a systematic review. International Journal of Environmental Research and Public Health, 19(21), 14080.
  • Panigrahi, N., Patro, S. G. K., Kumar, R., Omar, M., Ngan, T. T., Giang, N. L., ... & Thang, N. T. (2023). Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence. Earth Science Informatics, 16(2), 1701-1725.
  • Pandey, J., & Verma, S. (2022). Water Quality Analysis and Prediction Techniques Using Artificial Intelligence. In ICT with Intelligent Applications: Proceedings of ICTIS 2021, Volume 1 (pp. 279-290). Springer Singapore.
  • Yurtsever, M., & Murat, E. M. E. Ç. (2023). Potable Water Quality Prediction Using Artificial Intelligence and Machine Learning Algorithms for Better Sustainability. Ege Academic Review, 23(2), 265-278.
  • Khot, I. M., & Surve, A. R. (2020). IoT Assisted Drinkable Water Quality Analysis System using Machine Learning Techniques. International Journal for Research in Applied Science and Engineering Technology, 8, 228-236.
  • Kaddoura, S. (2022). Evaluation of Machine Learning Algorithm on Drinking Water Quality for Better Sustainability. Sustainability, 14(18), 11478.
  • Poudel, D., Shrestha, D., Bhattarai, S., & Ghimire, A. (2022). Comparison of machine learning algorithms in statistically imputed water potability dataset. Preprint, February.
  • Haq, M. I. T. K., Ramadhan, F. D., Az-Zahra, F., Kurniawati, L., & Helen, A. (2021, October). Classification of water potability using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Big Data Analytics (pp. 1-5). IEEE.
  • Patel, J., Amipara, C., Ahanger, T. A., Ladhva, K., Gupta, R. K., Alsaab, H. O., ... & Ratna, R. (2022). A machine learning-based water potability prediction model by using synthetic minority oversampling technique and explainable AI. Computational Intelligence and Neuroscience: CIN, 2022.
  • Kurra, S. S., Naidu, S. G., Chowdala, S., Yellanki, S. C., & Sunanda, D. B. E. (2022). Water quality prediction using machine learning. International Research Journal of Modernization in Engineering Technology and Science, India.
  • Dawood, T., Elwakil, E., Novoa, H. M., & Delgado, J. F. G. (2021). Toward urban sustainability and clean potable water: Prediction of water quality via artificial neural networks. Journal of Cleaner Production, 291, 125266.
  • Yusuf, H., Alhaddad, S., Yusuf, S., & Hewahi, N. (2022, October). Classification of Water Potability Using Machine Learning Algorithms. In 2022 International Conference on Data Analytics for Business and Industry (ICDABI) (pp. 454-458). IEEE.
  • Nasir, N., Kansal, A., Alshaltone, O., Barneih, F., Sameer, M., Shanableh, A., & Al-Shamma'a, A. (2022). Water quality classification using machine learning algorithms. Journal of Water Process Engineering, 48, 102920.
  • Abulail, N., Owda, A. Y., & Owda, M. (2023, August). Water Quality Classification Decision Support System. In 2023 International Conference on Information Technology (ICIT) (pp. 73-78). IEEE.
  • Sirikarin, K., & Khonthapagdee, S. (2023, June). Machine Learning Techniques for Water Quality Classification of Thailand's Rivers. In 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 470-475). IEEE.
  • Singh, R. I., & Lilhore, U. K. (2023, July). Water Quality Prediction Using Hybrid Classification Model. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) (pp. 1-5). IEEE.
  • Kaggle, Dataset, Erişim Linki: https://www.kaggle.com/datasets/uom190346a/water-quality-and-potability Erişim Tarihi: 26.06.2023
  • Chang, Y. Y., Lin, L., Pan, H. A., Lin, C. A., Hsieh, B. C., Bottrell, C., & Wang, P. W. (2022). SDSS-IV MaNGA: Unveiling Galaxy Interaction by Merger Stages with Machine Learning. The Astrophysical Journal, 937(2), 97.
  • Zhu, H., Zhou, M., Liu, G., Xie, Y., Liu, S., & Guo, C. (2023). NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection. IEEE Transactions on Computational Social Systems.
  • Raudys, Š. (2000). How good are support vector machines?. Neural Networks, 13(1), 17-19.
  • Chang, C. C., Li, Y. Z., Wu, H. C., & Tseng, M. H. (2022). Melanoma detection using XGB classifier combined with feature extraction and K-means SMOTE techniques. Diagnostics, 12(7), 1747.
  • Ghojogh, B., & Crowley, M. (2019). Linear and quadratic discriminant analysis: Tutorial. arXiv preprint arXiv:1906.02590.
  • Srivastava, S., Gupta, M. R., & Frigyik, B. A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6).
  • Abhishek, L. (2020, June). Optical character recognition using ensemble of SVM, MLP and extra trees classifier. In 2020 International Conference for Emerging Technology (INCET) (pp. 1-4). IEEE.
  • Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, 2014.
  • Bari Antor, M., Jamil, A. H. M., Mamtaz, M., Monirujjaman Khan, M., Aljahdali, S., Kaur, M., ... & Masud, M. (2021). A comparative analysis of machine learning algorithms to predict alzheimer’s disease. Journal of Healthcare Engineering, 2021.
  • Rayaroth, R. (2019). Random bagging classifier and shuffled frog leaping based optimal sensor placement for leakage detection in WDS. Water Resources Management, 33, 3111-3125.
  • Pandya, V. J. (2016, December). Comparing handwritten character recognition by AdaBoostClassifier and KNeighborsClassifier. In 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 271-274). IEEE.
  • Tudisco, F., Benson, A. R., & Prokopchik, K. (2021, April). Nonlinear higher-order label spreading. In Proceedings of the Web Conference 2021 (pp. 2402-2413).
  • Swain, P. H., & Hauska, H. (1977). The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics, 15(3), 142-147.
  • Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674.
  • Liao, X., Xue, Y., & Carin, L. (2005, August). Logistic regression with an auxiliary data source. In Proceedings of the 22nd international conference on Machine learning (pp. 505-512).
  • Liu, Z., Chen, G., Li, Z., Kang, Y., Qu, S., & Jiang, C. (2022). Psdc: A prototype-based shared-dummy classifier model for open-set domain adaptation. IEEE Transactions on Cybernetics.
  • Singh, A., Prakash, B. S., & Chandrasekaran, K. (2016, April). A comparison of linear discriminant analysis and ridge classifier on Twitter data. In 2016 International Conference on Computing, Communication and Automation (ICCCA) (pp. 133-138). IEEE.
  • Kabir, F., Siddique, S., Kotwal, M. R. A., & Huda, M. N. (2015, March). Bangla text document categorization using stochastic gradient descent (sgd) classifier. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP) (pp. 1-4). IEEE.
  • Foody, G. M. (2002). Status of land cover classification accuracy assessment. Remote sensing of environment, 80(1), 185-201.
  • Narkhede, S. (2018). Understanding auc-roc curve. Towards Data Science, 26(1), 220-227.
  • Yacouby, R., & Axman, D. (2020, November). Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the first workshop on evaluation and comparison of NLP systems (pp. 79-91).

MAKİNE ÖĞRENMESİ ALGORİTMALARI İLE SU KALİTESİ VE İÇİLEBİLİRLİK TAHMİNİ

Yıl 2023, Cilt: 6 Sayı: 2, 65 - 80, 30.12.2023

Öz

İçme suyu insanların yaşamlarını sürdürebilmeleri için hayati önem taşıyan temel ihtiyaçlarının başında gelmektedir. İnsan sağlığını doğrudan etkileyen bu ihtiyacın kalitesini ve içilebilirliğini anlamak önemlidir. Su kalitesi geleneksel laboratuvar ve istatistiksel analizler yoluyla tahmin edilebilir. Ancak bu çözüm genel olarak pahalı ve zaman alıcıdır. Son yıllarda hızla gelişen, hayatımızın bir çok alanına fayda sağlayan makine öğrenmesi yöntemleri ile su kullanılabilirliği hızlı ve verimli bir şekilde analiz edilebilir. Bu bağlamda gerçekleştirilen çalışmada, su kalitesinin ve içilebilirliğinin tahmini için 15 farklı makine öğrenmesi algoritması ile modeller geliştirilmiş ve elde ettikleri sonuçlar karşılaştırılmıştır. Model değerlendirmelerinde en iyi tahmin performansını LGBMClassifier ve SVC algoritmalarının sağladığı görülmüştür. En iyi tahmin performansını gösteren bu iki model için GridSearchCv nesnesi kullanılarak hiper parametre optimizasyonu gerçekleştirilmiştir. Optimizasyon işleminden sonra LGBMClassifier modeli %0,92 accuracy değeri ile en başarılı tahmin sonucunu elde etmiştir. Çalışma su kalitesi ve içilebilirliğini etkileyen faktörleri analiz etmesi, görselleştirmesi ve yüksek tahmin performansı ile gelecek çalışmalara yön verecektir.

Kaynakça

  • Yalçın, L., & Musa, G. Ö. K. (2013). Su Hakkının Bir Temel İnsan Hakkı Olarak Tanınma Süreci ve Türkiye’de Uygulanabilirliği. Memleket Siyaset Yönetim, 8(19-20), 25-62.
  • Yaseen, Z. M. (2021). An insight into machine learning models era in simulating soil, water bodies and adsorption heavy metals: Review, challenges and solutions. Chemosphere, 277, 130126.
  • Akhtar, N., Syakir Ishak, M. I., Bhawani, S. A., & Umar, K. (2021). Various natural and anthropogenic factors responsible for water quality degradation: A review. Water, 13(19), 2660.
  • Peng, H., Yang, W., Ferrer, A. S. N., Xiong, S., Li, X., Niu, G., & Lu, T. (2022). Hydrochemical characteristics and health risk assessment of groundwater in karst areas of southwest China: A case study of Bama, Guangxi. Journal of Cleaner Production, 341, 130872.
  • Zainurin, S. N., Wan Ismail, W. Z., Mahamud, S. N. I., Ismail, I., Jamaludin, J., Ariffin, K. N. Z., & Wan Ahmad Kamil, W. M. (2022). Advancements in monitoring water quality based on various sensing methods: a systematic review. International Journal of Environmental Research and Public Health, 19(21), 14080.
  • Panigrahi, N., Patro, S. G. K., Kumar, R., Omar, M., Ngan, T. T., Giang, N. L., ... & Thang, N. T. (2023). Groundwater Quality Analysis and Drinkability Prediction using Artificial Intelligence. Earth Science Informatics, 16(2), 1701-1725.
  • Pandey, J., & Verma, S. (2022). Water Quality Analysis and Prediction Techniques Using Artificial Intelligence. In ICT with Intelligent Applications: Proceedings of ICTIS 2021, Volume 1 (pp. 279-290). Springer Singapore.
  • Yurtsever, M., & Murat, E. M. E. Ç. (2023). Potable Water Quality Prediction Using Artificial Intelligence and Machine Learning Algorithms for Better Sustainability. Ege Academic Review, 23(2), 265-278.
  • Khot, I. M., & Surve, A. R. (2020). IoT Assisted Drinkable Water Quality Analysis System using Machine Learning Techniques. International Journal for Research in Applied Science and Engineering Technology, 8, 228-236.
  • Kaddoura, S. (2022). Evaluation of Machine Learning Algorithm on Drinking Water Quality for Better Sustainability. Sustainability, 14(18), 11478.
  • Poudel, D., Shrestha, D., Bhattarai, S., & Ghimire, A. (2022). Comparison of machine learning algorithms in statistically imputed water potability dataset. Preprint, February.
  • Haq, M. I. T. K., Ramadhan, F. D., Az-Zahra, F., Kurniawati, L., & Helen, A. (2021, October). Classification of water potability using machine learning algorithms. In 2021 International Conference on Artificial Intelligence and Big Data Analytics (pp. 1-5). IEEE.
  • Patel, J., Amipara, C., Ahanger, T. A., Ladhva, K., Gupta, R. K., Alsaab, H. O., ... & Ratna, R. (2022). A machine learning-based water potability prediction model by using synthetic minority oversampling technique and explainable AI. Computational Intelligence and Neuroscience: CIN, 2022.
  • Kurra, S. S., Naidu, S. G., Chowdala, S., Yellanki, S. C., & Sunanda, D. B. E. (2022). Water quality prediction using machine learning. International Research Journal of Modernization in Engineering Technology and Science, India.
  • Dawood, T., Elwakil, E., Novoa, H. M., & Delgado, J. F. G. (2021). Toward urban sustainability and clean potable water: Prediction of water quality via artificial neural networks. Journal of Cleaner Production, 291, 125266.
  • Yusuf, H., Alhaddad, S., Yusuf, S., & Hewahi, N. (2022, October). Classification of Water Potability Using Machine Learning Algorithms. In 2022 International Conference on Data Analytics for Business and Industry (ICDABI) (pp. 454-458). IEEE.
  • Nasir, N., Kansal, A., Alshaltone, O., Barneih, F., Sameer, M., Shanableh, A., & Al-Shamma'a, A. (2022). Water quality classification using machine learning algorithms. Journal of Water Process Engineering, 48, 102920.
  • Abulail, N., Owda, A. Y., & Owda, M. (2023, August). Water Quality Classification Decision Support System. In 2023 International Conference on Information Technology (ICIT) (pp. 73-78). IEEE.
  • Sirikarin, K., & Khonthapagdee, S. (2023, June). Machine Learning Techniques for Water Quality Classification of Thailand's Rivers. In 2023 20th International Joint Conference on Computer Science and Software Engineering (JCSSE) (pp. 470-475). IEEE.
  • Singh, R. I., & Lilhore, U. K. (2023, July). Water Quality Prediction Using Hybrid Classification Model. In 2023 3rd International Conference on Electrical, Computer, Communications and Mechatronics Engineering (ICECCME) (pp. 1-5). IEEE.
  • Kaggle, Dataset, Erişim Linki: https://www.kaggle.com/datasets/uom190346a/water-quality-and-potability Erişim Tarihi: 26.06.2023
  • Chang, Y. Y., Lin, L., Pan, H. A., Lin, C. A., Hsieh, B. C., Bottrell, C., & Wang, P. W. (2022). SDSS-IV MaNGA: Unveiling Galaxy Interaction by Merger Stages with Machine Learning. The Astrophysical Journal, 937(2), 97.
  • Zhu, H., Zhou, M., Liu, G., Xie, Y., Liu, S., & Guo, C. (2023). NUS: Noisy-Sample-Removed Undersampling Scheme for Imbalanced Classification and Application to Credit Card Fraud Detection. IEEE Transactions on Computational Social Systems.
  • Raudys, Š. (2000). How good are support vector machines?. Neural Networks, 13(1), 17-19.
  • Chang, C. C., Li, Y. Z., Wu, H. C., & Tseng, M. H. (2022). Melanoma detection using XGB classifier combined with feature extraction and K-means SMOTE techniques. Diagnostics, 12(7), 1747.
  • Ghojogh, B., & Crowley, M. (2019). Linear and quadratic discriminant analysis: Tutorial. arXiv preprint arXiv:1906.02590.
  • Srivastava, S., Gupta, M. R., & Frigyik, B. A. (2007). Bayesian quadratic discriminant analysis. Journal of Machine Learning Research, 8(6).
  • Abhishek, L. (2020, June). Optical character recognition using ensemble of SVM, MLP and extra trees classifier. In 2020 International Conference for Emerging Technology (INCET) (pp. 1-4). IEEE.
  • Akinyelu, A. A., & Adewumi, A. O. (2014). Classification of phishing email using random forest machine learning technique. Journal of Applied Mathematics, 2014.
  • Bari Antor, M., Jamil, A. H. M., Mamtaz, M., Monirujjaman Khan, M., Aljahdali, S., Kaur, M., ... & Masud, M. (2021). A comparative analysis of machine learning algorithms to predict alzheimer’s disease. Journal of Healthcare Engineering, 2021.
  • Rayaroth, R. (2019). Random bagging classifier and shuffled frog leaping based optimal sensor placement for leakage detection in WDS. Water Resources Management, 33, 3111-3125.
  • Pandya, V. J. (2016, December). Comparing handwritten character recognition by AdaBoostClassifier and KNeighborsClassifier. In 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 271-274). IEEE.
  • Tudisco, F., Benson, A. R., & Prokopchik, K. (2021, April). Nonlinear higher-order label spreading. In Proceedings of the Web Conference 2021 (pp. 2402-2413).
  • Swain, P. H., & Hauska, H. (1977). The decision tree classifier: Design and potential. IEEE Transactions on Geoscience Electronics, 15(3), 142-147.
  • Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674.
  • Liao, X., Xue, Y., & Carin, L. (2005, August). Logistic regression with an auxiliary data source. In Proceedings of the 22nd international conference on Machine learning (pp. 505-512).
  • Liu, Z., Chen, G., Li, Z., Kang, Y., Qu, S., & Jiang, C. (2022). Psdc: A prototype-based shared-dummy classifier model for open-set domain adaptation. IEEE Transactions on Cybernetics.
  • Singh, A., Prakash, B. S., & Chandrasekaran, K. (2016, April). A comparison of linear discriminant analysis and ridge classifier on Twitter data. In 2016 International Conference on Computing, Communication and Automation (ICCCA) (pp. 133-138). IEEE.
  • Kabir, F., Siddique, S., Kotwal, M. R. A., & Huda, M. N. (2015, March). Bangla text document categorization using stochastic gradient descent (sgd) classifier. In 2015 International Conference on Cognitive Computing and Information Processing (CCIP) (pp. 1-4). IEEE.
  • Foody, G. M. (2002). Status of land cover classification accuracy assessment. Remote sensing of environment, 80(1), 185-201.
  • Narkhede, S. (2018). Understanding auc-roc curve. Towards Data Science, 26(1), 220-227.
  • Yacouby, R., & Axman, D. (2020, November). Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the first workshop on evaluation and comparison of NLP systems (pp. 79-91).
Toplam 42 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgi Modelleme, Yönetim ve Ontolojiler
Bölüm Araştırma Makalesi
Yazarlar

Tülay Turan 0000-0002-0888-0343

Erken Görünüm Tarihi 29 Aralık 2023
Yayımlanma Tarihi 30 Aralık 2023
Gönderilme Tarihi 2 Kasım 2023
Kabul Tarihi 14 Kasım 2023
Yayımlandığı Sayı Yıl 2023 Cilt: 6 Sayı: 2

Kaynak Göster

APA Turan, T. (2023). MAKİNE ÖĞRENMESİ ALGORİTMALARI İLE SU KALİTESİ VE İÇİLEBİLİRLİK TAHMİNİ. Uluborlu Mesleki Bilimler Dergisi, 6(2), 65-80.
Creative Commons Lisansı
Isparta Uygulamalı Bilimler Üniversitesi Uluborlu Mesleki Bilimler Dergisi Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.