Masterarbeit, 2022
69 Seiten, Note: 3
1 Introduction
2 Objective
3 Literature Review
3.1 Related Work
3.2 Machine learning approaches
3.2.1 Logistic Regression
3.2.2 Decision Tree
3.2.3 Working of Decision Tree
3.2.4 Random Forest
3.2.5 Support Vector Machines (SVM)
3.2.6 K-Nearest Neighbours (KNN)
3.2.7 Gradient Boosted Trees
3.2.8 Research Method Data Challenges
3.3 Recent Fraud Cases
3.4 CRISP-DM Model
3.4.1 Business Understanding
3.4.2 Data Understanding
3.4.3 Data Preparation
3.4.4 Data Modelling
3.4.5 Model evaluation
3.4.6 Model Deployment
4 Methodology and Case Study
4.1 Banking Theory
4.2 Data Description
4.3 Data Preparation
4.3.1 Scaling the data
4.3.2 Missing values handling
4.3.3 Dropping NA
4.3.4 Data encoding
4.4 Data Visualisation
4.4.1 Univariate Analysis
4.4.2 Histograms
4.4.3 Boxplot
4.4.4 Bivariate analysis
4.4.5 Correlation
4.4.6 Summary from EDA
4.5 Feature Selection
4.5.1 ANOVA
4.6 Model Comparison and Results
4.6.1 Logistic Regression
4.6.2 Decision Tree
4.6.3 Random Forest
4.6.4 XGBBoost
4.6.5 GradientBoosting
4.6.6 LGBMclassifier
4.7 Classification Evaluation Metrics
4.7.1 Confusion Matrix
4.7.2 Precision
4.7.3 Recall
4.7.4 F1 score
4.7.5 AUC-ROC
4.7.6 Receiver operating characteristic (ROC) Curve
4.7.7 Accuracy
4.7.8 Imbalanced Data
4.8 Possible next steps
5 Summary and Conclusion
This thesis aims to develop a robust machine learning-based system to detect fraudulent credit card transactions by analyzing historical payment data. The study addresses challenges such as dataset imbalance and provides a systematic comparison of various classification models to optimize fraud detection performance.
Benefits of machine learning in fraud detection
Modern analytics technologies and systems rely heavily on humans to examine data and discover suspicious transactions and fraud. This reliance is vulnerable to difficulties such as slowness and human mistake. Some of these problems can be solved with the help of machine learning. The following are some of the advantages of machine learning for banks to avoid loss by detecting fraud:
Speed As the speed and volume of eCommerce grow, speed becomes increasingly critical.Machine learning algorithms are capable of analysing large amounts of data in a short duration of time. The model has the capability of collecting and analysing new data in real time if deployed.
Efficiency Algorithms can evaluate huge amounts of payments each second, which is far more work than a team of human analysts could complete in the same amount of time. This decreases down on both the expenses and the time complexity it takes to evaluate transactions, making the process more efficient.Machine learning algorithms are capable of automating repetitive operations and detecting small pattern changes in massive volumes of data. This is crucial for detecting fraud faster than humans.
1 Introduction: Provides an overview of the rising problem of fraud in the banking sector and highlights the necessity for advanced predictive models.
2 Objective: Outlines the research goals, including the identification of the best-performing fraud detection models and the handling of unbalanced data.
3 Literature Review: Discusses existing machine learning approaches and the CRISP-DM methodology used for fraud detection.
4 Methodology and Case Study: Describes the dataset, data preparation, feature selection, and the comparative results of classification algorithms.
5 Summary and Conclusion: Summarizes the key findings and provides recommendations for integrating these models into bank risk management systems.
Machine Learning, Fraud Detection, Credit Card Default, CRISP-DM, Logistic Regression, Decision Tree, Random Forest, XGBoost, LGBMclassifier, Data Mining, Fraud Prevention, Risk Management, Classification, AUC-ROC, Imbalanced Data
The research focuses on utilizing machine learning algorithms to detect fraudulent banking and credit card transactions to minimize financial losses.
The core areas include fraud detection theory, the application of various supervised machine learning models, exploratory data analysis of payment behavior, and strategies for managing imbalanced datasets.
The objective is to find a high-performing classification model that efficiently identifies potential fraudulent transactions and default payments in complex, real-world datasets.
The study employs the CRISP-DM (Cross-Industry Standard Process for Data Mining) model to structure the research, alongside algorithms like Logistic Regression, Random Forest, SVM, XGBoost, and LGBMclassifier.
The main body detail the literature review, data description, data cleaning and encoding, Exploratory Data Analysis (EDA), feature selection techniques like ANOVA, and a comparative performance analysis of various ML models.
The work is characterized by terms such as Machine Learning, Fraud Detection, Credit Card Default, Data Mining, Classification, and Risk Management.
The research addresses imbalanced data using techniques like SMOTE (Synthetic Minority Oversampling Technique), undersampling, and oversampling to enhance the detection of the minority fraudulent class.
The LGBMclassifier achieved the best results in terms of both accuracy and AUC scores when compared to other supervised learning models like Decision Trees and Random Forests.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

