TY - JOUR
T1 - Comparative Analysis of Machine Learning Models for Credit Card Fraud Detection Using SMOTE for Class Imbalance
AU - Andrade-Arenas, Laberiano
AU - Yactayo-Arias, Cesar
N1 - Publisher Copyright:
©2025 The authors.
PY - 2025/5
Y1 - 2025/5
N2 - Credit card fraud poses significant financial and security challenges, with negative consequences for consumers and financial institutions. An efficient, accurate detection system is essential. This study aims to determine which machine learning (ML)method performs best for detecting fraudulent credit card transactions by evaluating models such as Naive Bayes, Logistic Regression, k-NN, Decision Trees, as well as Random Forests, XGBoost, and AdaBoost. The models were evaluated using an open-access dataset from Kaggle, which includes actual payment activities conducted with credit cards by European cardholders in 2013. Due to data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to enhance performance. Results indicate that Random Forest and XGBoost outperformed other models in terms of accuracy, F1 score, and the areas under the ROC (AUC) and precision-recall (AUPRC) curves. Specifically, Random Forest achieved an accuracy of 0.999, F1 score of 0.872, AUC of 0.978, and AUPRC of 0.871, while XGBoost reached an accuracy of 0.999, F1 score of 0.837, AUC of 0.983, and AUPRC of 0.867. In conclusion, Random Forest and XGBoost demonstrated superior performance, offering promising tools for effective credit card fraud detection. However, the use of 2013 data may limit the generalizability of results to more recent fraud patterns.
AB - Credit card fraud poses significant financial and security challenges, with negative consequences for consumers and financial institutions. An efficient, accurate detection system is essential. This study aims to determine which machine learning (ML)method performs best for detecting fraudulent credit card transactions by evaluating models such as Naive Bayes, Logistic Regression, k-NN, Decision Trees, as well as Random Forests, XGBoost, and AdaBoost. The models were evaluated using an open-access dataset from Kaggle, which includes actual payment activities conducted with credit cards by European cardholders in 2013. Due to data imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied to enhance performance. Results indicate that Random Forest and XGBoost outperformed other models in terms of accuracy, F1 score, and the areas under the ROC (AUC) and precision-recall (AUPRC) curves. Specifically, Random Forest achieved an accuracy of 0.999, F1 score of 0.872, AUC of 0.978, and AUPRC of 0.871, while XGBoost reached an accuracy of 0.999, F1 score of 0.837, AUC of 0.983, and AUPRC of 0.867. In conclusion, Random Forest and XGBoost demonstrated superior performance, offering promising tools for effective credit card fraud detection. However, the use of 2013 data may limit the generalizability of results to more recent fraud patterns.
KW - SMOTE
KW - XGBoost
KW - class imbalance
KW - credit card fraud
KW - financial institution
KW - machine learning
KW - random forest
UR - https://www.scopus.com/pages/publications/105010140789
U2 - 10.18280/ijsse.150504
DO - 10.18280/ijsse.150504
M3 - Original Article
AN - SCOPUS:105010140789
SN - 2041-9031
VL - 15
SP - 893
EP - 901
JO - International Journal of Safety and Security Engineering
JF - International Journal of Safety and Security Engineering
IS - 5
ER -