Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets

Vahid Sinap

doi:10.31127/tuje.1386127

Araştırma Makalesi

Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets

Yıl 2024, Cilt: 8 Sayı: 2, 196 - 208, 30.04.2024

Vahid Sinap

https://doi.org/10.31127/tuje.1386127

Öz

The main objective of this research is to evaluate the performance of machine learning algorithms in the field of credit card fraud detection and then compare them according to various performance metrics. Seven different supervised classification algorithms including Logistic Regression, Decision Trees, Random Forest, XGBoost, Naive Bayes, K-Nearest Neighbors and Support Vector Machine were used. The performance of these algorithms was measured through a comprehensive evaluation of metrics including Accuracy, Precision, Recall, F-Score, AUC and AUPRC values. Furthermore, ROC curves and confusion matrices were used to evaluate these algorithms. The data preparation phase is critical in this study. The data imbalance problem arises as an unequal distribution between fraudulent and non-fraudulent transactions. Addressing this imbalance is imperative for successful model training and subsequent reliable results. Various techniques, such as Scaling and Distribution, Random Under-Sampling, Dimensionality Reduction, and Clustering, are employed to ensure an accurate evaluation of model performance and its ability to generalize effectively. As a result, the "Random Forest" and "K-Nearest Neighbors" algorithms exhibit the highest performance levels in this research with 97% accuracy rates. This study contributes significantly to the ongoing fight against financial fraud and provides valuable guidance for future research efforts.

Anahtar Kelimeler

Credit card fraud, Fraud detection, Data mining, Machine learning, Imbalanced datasets

Kaynakça

Akers, D., Golter, J., Lamm, B., & Solt, M. (2005). Overview of recent developments in the credit card industry. FDIC Banking Review, 17, 23-35.
Heggestuen, J. (2020). Credit-card fraud surges 35% as coronavirus freezes the economy and wipes out jobs. Business Insider. https://markets.businessinsider.com/news/stocks/credit-card-account-fraud-skyrockets-coronavirus-pandemic-recession-economy-layoffs-2020-5-1029246107
Çalışkan, M. A. (2021). Credit card fraud in Turkey increased by 25% in 2020. Hürriyet. https://www.hurriyet.com.tr/haberleri/kredi-karti-dolandiriciligi
Bhatla, T. P., Prabhu, V., & Dua, A. (2003). Understanding credit card frauds. Cards Business Review, 1(6), 1-15.
Şenel, S. A., & Arslan, Ö. (2019). The role of forensic accounting profession in preventing the accounting scandals. Cumhuriyet University Journal of Economics and Administrative Sciences, 20(1), 293-308
Tripathi, K. K., & Pavaskar, M. A. (2012). Survey on credit card fraud detection methods. International Journal of Emerging Technology and Advanced Engineering, 2(11), 721-726.
Sevli, O. (2022). Kredi kartı dolandırıcılığının yapay sinir ağları kullanılarak tespiti. 11th International Conference on Applied Sciences, 233-240. Academy Global Publishing House.
Joo, S. H., Grable, J. E., & Bagwell, D. C. (2003). Credit card attitudes and behaviors of college students. College Student Journal, 37(3), 405-420.
Fogarty, T. C., Ireson, N. S., & Battle, S. A. (1992). Developing rule-based systems for credit-card applications from data with the genetic algorithm. IMA Journal of Management Mathematics, 4(1), 53-59. https://doi.org/10.1093/imaman/4.1.53
Raj, S. B. E., & Portia, A. A. (2011). Analysis on credit card fraud detection methods. In 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), 152-156. https://doi.org/10.1109/ICCCET.2011.5762457
Dornadula, V. N., & Geetha, S. (2019). Credit card fraud detection using machine learning algorithms. Procedia Computer Science, 165, 631-641. https://doi.org/10.1016/j.procs.2020.01.057
Yee, O. S., Sagadevan, S., & Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4), 23-27.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
Jha, S., Guillen, M., & Westland, J. C. (2012). Employing transaction aggregation strategy to detect credit card fraud. Expert Systems with Applications, 39(16), 12650-12657. https://doi.org/10.1016/j.eswa.2012.05.018
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
Dal Pozzolo, A., Caelen, O., Le Borgne, Y. A., Waterschoot, S., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915-4928. https://doi.org/10.1016/j.eswa.2014.02.026
Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. https://doi.org/10.1016/j.dss.2010.08.008
Pulat, M., & Deveci, I. (2021). Bibliometric Analysis of Theses Published on Machine Learning and Decision Trees in Turkey. Journal of Management and Economics, 28(2), 287-308.
Albayrak, A. S., & Yilmaz, S. K. (2009). Veri Madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Suleyman Demirel University Journal of Faculty of Economics & Administrative Sciences, 14(1), 31-52.
Akça, M. F., & Sevli, O. (2022). Predicting acceptance of the bank loan offers by using support vector machines. International Advanced Researches and Engineering Journal, 6(2), 142-147. https://doi.org/10.35860/iarej.1058724
Bircan, H. (2004). Logistic regression analysis: An application on medical data. Kocaeli University Journal of Social Sciences, 8, 185-208.
Yavuz, A., & Çilengiroğlu, Ö. V. (2020). Lojistik regresyon ve CART yöntemlerinin tahmin edici performanslarının yaşam memnuniyeti verileri için karşılaştırılması. Avrupa Bilim ve Teknoloji Dergisi, (18), 719-727. https://doi.org/10.31590/ejosat.691215
Çalış, A., Kayapınar, S., & Çetinyokuş, T. (2014). An application on computer and internet security with decision tree algorithms in data mining. Journal of Industrial Engineering, 25(3), 2-19.
Türk, S. T., & Balçık, F. (2023). Rastgele orman algoritması ve Sentinel-2 MSI ile fındık ekili alanların belirlenmesi: Piraziz Örneği. Geomatik, 8(2), 91-98. https://doi.org/10.29128/geomatik.1127925
Akar, Ö., & Güngör, O. (2012). Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi ve Jeoinformasyon Dergisi, 1(2), 139-146. https://doi.org/10.9733/jgg.241212.1t
Alshari, H., Saleh, A. Y., & Odabaş, A. (2021). Comparison of gradient boosting decision tree algorithms for CPU performance. Journal of Institue of Science and Technology, 37(1), 157-168.
Şahin, E. M., Sahin, S., & Tanağardıgil, İ. (2021). Battery State of Health and Charge Estimation Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (26), 389-394. https://doi.org/10.31590/ejosat.959630
Zhang, H., & Li, D. (2007). Naïve Bayes text classifier. In 2007 IEEE international conference on granular computing (GRC 2007), 708-711. https://doi.org/10.1109/GrC.2007.40
Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of Computers, 4(3), 230-237.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://doi.org/ 10.1109/5254.708428
Polyzotis, N., Zinkevich, M., Roy, S., Breck, E., & Whang, S. (2019). Data validation for machine learning. Proceedings of Machine Learning and Systems, 1, 334-347.
Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the precision-recall curve: point estimates and confidence intervals. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, 451-466. https://doi.org/10.1007/978-3-642-40994-3_29
Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218-225. https://doi.org/10.21037/atm.2016.03.37
MLG-ULB. (2017). Credit Card Fraud Detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Mishra, A., & Ghorpade, C. (2018). Credit card fraud detection on the skewed data using various classification and ensemble techniques. In 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 1-5. https://doi.org/10.1109/SCEECS.2018.8546939
Navamani, C., & Krishnan, S. (2018). Credit card nearest neighbor based outlier detection techniques. International Journal of Computer Techniques, 5(2), 56-60.
Kazemi, Z., & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. In 2017 IEEE 4th International conference on knowledge-based engineering and innovation (KBEI), 630-633. https://doi.org/10.1109/KBEI.2017.8324876
Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In 2018 IEEE international conference on information reuse and integration (IRI), 122-125. https://doi.org/10.1109/IRI.2018.00025
Wang, C., Wang, Y., Ye, Z., Yan, L., Cai, W., & Pan, S. (2018). Credit card fraud detection based on whale algorithm optimized BP neural network. In 2018 13th international Conference on Computer Science & Education (ICCSE), 1-4. https://doi.org/10.1109/ICCSE.2018.8468855
Pumsirirat, A., & Liu, Y. (2018). Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. International Journal of Advanced Computer Science and Applications, 9(1), 18-25.
Sarızeybek, A. T., & Sevli, O. (2022). Makine Öğrenmesi Yöntemleri ile Banka Müşterilerinin Kredi Alma Eğiliminin Karşılaştırmalı Analizi. Journal of Intelligent Systems: Theory and Applications, 5(2), 137-144. https://doi.org/10.38016/jista.1036047

Yıl 2024, Cilt: 8 Sayı: 2, 196 - 208, 30.04.2024

Vahid Sinap

https://doi.org/10.31127/tuje.1386127

Öz

Kaynakça

Akers, D., Golter, J., Lamm, B., & Solt, M. (2005). Overview of recent developments in the credit card industry. FDIC Banking Review, 17, 23-35.
Heggestuen, J. (2020). Credit-card fraud surges 35% as coronavirus freezes the economy and wipes out jobs. Business Insider. https://markets.businessinsider.com/news/stocks/credit-card-account-fraud-skyrockets-coronavirus-pandemic-recession-economy-layoffs-2020-5-1029246107
Çalışkan, M. A. (2021). Credit card fraud in Turkey increased by 25% in 2020. Hürriyet. https://www.hurriyet.com.tr/haberleri/kredi-karti-dolandiriciligi
Bhatla, T. P., Prabhu, V., & Dua, A. (2003). Understanding credit card frauds. Cards Business Review, 1(6), 1-15.
Şenel, S. A., & Arslan, Ö. (2019). The role of forensic accounting profession in preventing the accounting scandals. Cumhuriyet University Journal of Economics and Administrative Sciences, 20(1), 293-308
Tripathi, K. K., & Pavaskar, M. A. (2012). Survey on credit card fraud detection methods. International Journal of Emerging Technology and Advanced Engineering, 2(11), 721-726.
Sevli, O. (2022). Kredi kartı dolandırıcılığının yapay sinir ağları kullanılarak tespiti. 11th International Conference on Applied Sciences, 233-240. Academy Global Publishing House.
Joo, S. H., Grable, J. E., & Bagwell, D. C. (2003). Credit card attitudes and behaviors of college students. College Student Journal, 37(3), 405-420.
Fogarty, T. C., Ireson, N. S., & Battle, S. A. (1992). Developing rule-based systems for credit-card applications from data with the genetic algorithm. IMA Journal of Management Mathematics, 4(1), 53-59. https://doi.org/10.1093/imaman/4.1.53
Raj, S. B. E., & Portia, A. A. (2011). Analysis on credit card fraud detection methods. In 2011 International Conference on Computer, Communication and Electrical Technology (ICCCET), 152-156. https://doi.org/10.1109/ICCCET.2011.5762457
Dornadula, V. N., & Geetha, S. (2019). Credit card fraud detection using machine learning algorithms. Procedia Computer Science, 165, 631-641. https://doi.org/10.1016/j.procs.2020.01.057
Yee, O. S., Sagadevan, S., & Malim, N. H. A. H. (2018). Credit card fraud detection using machine learning as data mining technique. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 10(1-4), 23-27.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357. https://doi.org/10.1613/jair.953
Jha, S., Guillen, M., & Westland, J. C. (2012). Employing transaction aggregation strategy to detect credit card fraud. Expert Systems with Applications, 39(16), 12650-12657. https://doi.org/10.1016/j.eswa.2012.05.018
Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., ... & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43-76. https://doi.org/10.1109/JPROC.2020.3004555
Dal Pozzolo, A., Caelen, O., Le Borgne, Y. A., Waterschoot, S., & Bontempi, G. (2014). Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10), 4915-4928. https://doi.org/10.1016/j.eswa.2014.02.026
Bhattacharyya, S., Jha, S., Tharakunnel, K., & Westland, J. C. (2011). Data mining for credit card fraud: A comparative study. Decision Support Systems, 50(3), 602-613. https://doi.org/10.1016/j.dss.2010.08.008
Pulat, M., & Deveci, I. (2021). Bibliometric Analysis of Theses Published on Machine Learning and Decision Trees in Turkey. Journal of Management and Economics, 28(2), 287-308.
Albayrak, A. S., & Yilmaz, S. K. (2009). Veri Madenciliği: Karar ağacı algoritmaları ve İMKB verileri üzerine bir uygulama. Suleyman Demirel University Journal of Faculty of Economics & Administrative Sciences, 14(1), 31-52.
Akça, M. F., & Sevli, O. (2022). Predicting acceptance of the bank loan offers by using support vector machines. International Advanced Researches and Engineering Journal, 6(2), 142-147. https://doi.org/10.35860/iarej.1058724
Bircan, H. (2004). Logistic regression analysis: An application on medical data. Kocaeli University Journal of Social Sciences, 8, 185-208.
Yavuz, A., & Çilengiroğlu, Ö. V. (2020). Lojistik regresyon ve CART yöntemlerinin tahmin edici performanslarının yaşam memnuniyeti verileri için karşılaştırılması. Avrupa Bilim ve Teknoloji Dergisi, (18), 719-727. https://doi.org/10.31590/ejosat.691215
Çalış, A., Kayapınar, S., & Çetinyokuş, T. (2014). An application on computer and internet security with decision tree algorithms in data mining. Journal of Industrial Engineering, 25(3), 2-19.
Türk, S. T., & Balçık, F. (2023). Rastgele orman algoritması ve Sentinel-2 MSI ile fındık ekili alanların belirlenmesi: Piraziz Örneği. Geomatik, 8(2), 91-98. https://doi.org/10.29128/geomatik.1127925
Akar, Ö., & Güngör, O. (2012). Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi ve Jeoinformasyon Dergisi, 1(2), 139-146. https://doi.org/10.9733/jgg.241212.1t
Alshari, H., Saleh, A. Y., & Odabaş, A. (2021). Comparison of gradient boosting decision tree algorithms for CPU performance. Journal of Institue of Science and Technology, 37(1), 157-168.
Şahin, E. M., Sahin, S., & Tanağardıgil, İ. (2021). Battery State of Health and Charge Estimation Using Machine Learning Methods. Avrupa Bilim ve Teknoloji Dergisi, (26), 389-394. https://doi.org/10.31590/ejosat.959630
Zhang, H., & Li, D. (2007). Naïve Bayes text classifier. In 2007 IEEE international conference on granular computing (GRC 2007), 708-711. https://doi.org/10.1109/GrC.2007.40
Yong, Z., Youwen, L., & Shixiong, X. (2009). An improved KNN text classification algorithm based on clustering. Journal of Computers, 4(3), 230-237.
Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., & Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4), 18-28. https://doi.org/ 10.1109/5254.708428
Polyzotis, N., Zinkevich, M., Roy, S., Breck, E., & Whang, S. (2019). Data validation for machine learning. Proceedings of Machine Learning and Systems, 1, 334-347.
Boyd, K., Eng, K. H., & Page, C. D. (2013). Area under the precision-recall curve: point estimates and confidence intervals. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part III 13, 451-466. https://doi.org/10.1007/978-3-642-40994-3_29
Zhang, Z. (2016). Introduction to machine learning: k-nearest neighbors. Annals of Translational Medicine, 4(11), 218-225. https://doi.org/10.21037/atm.2016.03.37
MLG-ULB. (2017). Credit Card Fraud Detection. Kaggle. https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Mishra, A., & Ghorpade, C. (2018). Credit card fraud detection on the skewed data using various classification and ensemble techniques. In 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), 1-5. https://doi.org/10.1109/SCEECS.2018.8546939
Navamani, C., & Krishnan, S. (2018). Credit card nearest neighbor based outlier detection techniques. International Journal of Computer Techniques, 5(2), 56-60.
Kazemi, Z., & Zarrabi, H. (2017). Using deep networks for fraud detection in the credit card transactions. In 2017 IEEE 4th International conference on knowledge-based engineering and innovation (KBEI), 630-633. https://doi.org/10.1109/KBEI.2017.8324876
Dhankhad, S., Mohammed, E., & Far, B. (2018). Supervised machine learning algorithms for credit card fraudulent transaction detection: a comparative study. In 2018 IEEE international conference on information reuse and integration (IRI), 122-125. https://doi.org/10.1109/IRI.2018.00025
Wang, C., Wang, Y., Ye, Z., Yan, L., Cai, W., & Pan, S. (2018). Credit card fraud detection based on whale algorithm optimized BP neural network. In 2018 13th international Conference on Computer Science & Education (ICCSE), 1-4. https://doi.org/10.1109/ICCSE.2018.8468855
Pumsirirat, A., & Liu, Y. (2018). Credit card fraud detection using deep learning based on auto-encoder and restricted boltzmann machine. International Journal of Advanced Computer Science and Applications, 9(1), 18-25.
Sarızeybek, A. T., & Sevli, O. (2022). Makine Öğrenmesi Yöntemleri ile Banka Müşterilerinin Kredi Alma Eğiliminin Karşılaştırmalı Analizi. Journal of Intelligent Systems: Theory and Applications, 5(2), 137-144. https://doi.org/10.38016/jista.1036047

Toplam 41 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	İletişim Mühendisliği (Diğer)
Bölüm	Articles
Yazarlar	Vahid Sinap 0000-0002-8734-9509
Erken Görünüm Tarihi	7 Nisan 2024
Yayımlanma Tarihi	30 Nisan 2024
Gönderilme Tarihi	4 Kasım 2023
Kabul Tarihi	3 Aralık 2023
Yayımlandığı Sayı	Yıl 2024 Cilt: 8 Sayı: 2

Kaynak Göster

APA	Sinap, V. (2024). Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. Turkish Journal of Engineering, 8(2), 196-208. https://doi.org/10.31127/tuje.1386127
AMA	Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. Nisan 2024;8(2):196-208. doi:10.31127/tuje.1386127
Chicago	Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering 8, sy. 2 (Nisan 2024): 196-208. https://doi.org/10.31127/tuje.1386127.
EndNote	Sinap V (01 Nisan 2024) Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. Turkish Journal of Engineering 8 2 196–208.
IEEE	V. Sinap, “Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets”, TUJE, c. 8, sy. 2, ss. 196–208, 2024, doi: 10.31127/tuje.1386127.
ISNAD	Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering 8/2 (Nisan 2024), 196-208. https://doi.org/10.31127/tuje.1386127.
JAMA	Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. 2024;8:196–208.
MLA	Sinap, Vahid. “Comparative Analysis of Machine Learning Techniques for Credit Card Fraud Detection: Dealing With Imbalanced Datasets”. Turkish Journal of Engineering, c. 8, sy. 2, 2024, ss. 196-08, doi:10.31127/tuje.1386127.
Vancouver	Sinap V. Comparative analysis of machine learning techniques for credit card fraud detection: Dealing with imbalanced datasets. TUJE. 2024;8(2):196-208.

Makale Dosyaları

Tam Metin