Bachelorarbeit, 2017
93 Seiten, Note: 1.3
1 Introduction
1.1 Motivation and problem statement
1.2 Research Methodology
1.3 Goal and structure of the thesis
2 White-Collar Crime
2.1 Fraud Management
2.1.1 Fraud Prevention
2.1.2 Fraud Detection
2.1.3 Fraud Investigation
2.2 Fraud Triangle
2.2.1 Opportunity
2.2.2 Incentive/Pressure
2.2.3 Rationalization/Attitude
3 Types of White-Collar-Crimes
3.1 Fraud
3.2 Credit Card Fraud
3.3 Healthcare Fraud
3.4 Embezzlement
3.5 Criminal Insolvency Offences
3.6 Corruption
4 Data Mining, Text Mining and Big Data
4.1 Introduction into Big Data
4.1.1 The 3 V’s of Big Data
4.1.2 Data Forms
4.2 Data Mining
4.2.1 Types of Machine Learning
4.2.2 Classification of Data Mining Applications
4.3 Text Mining
4.3.1 Practise areas of Text Mining
4.3.2 Example of feature extraction from unstructured data
4.4 Context of Data Mining and Text Mining in White-Collar Crime
5 A case study on Credit Card Fraud Detection
5.1 Overview
5.2 Data Exploration
5.3 Confusion Matrix Terminology
5.4 Algorithms and Techniques
5.4.1 Literature review on Data Mining Techniques
5.4.2 Selection of Data Mining Techniques
5.5 Sampling techniques
5.6 Train and Test Set
5.7 Imbalanced Data
5.7.1 Results on imbalanced Data
5.8 Undersampled Data
5.8.1 Results on undersampled Data
5.9 Oversampled Data
5.9.1 Results on oversampled Data
5.10 Oversampled Data with SMOTE
5.10.1 Results on Oversampled Data with SMOTE
5.11 Undersampled Data with Hyperparameters Optimization
5.11.1 Model Parameter and Hyperparameter
5.11.2 Hyperparameter optimization algorithms
5.11.3 Explanation of selected Hyperparameters
5.11.4 Cross-Validation
5.11.5 Selection of Hyperparameter Optimization Algorithm and k-fold CV
5.11.6 Results on Undersampled Data with Hyperparameter Optimization
5.12 Review of the case study: Credit Card Fraud Detection
6 Conclusion
The primary research objective of this thesis is to determine which data mining techniques are effective for detecting fraudulent activities in both structured and unstructured datasets, specifically within the context of white-collar crime. The study bridges the gap between theoretical crime analysis and practical technical implementation.
4.3.1 Practise areas of Text Mining
Information Retrieval (IR) The main task of Information Retrieval (IR) is not to analyse the data, but to index, search and retrieve documents from large text databases with keyword queries (Miner et al., 2012: 36). At the present time, IR systems are used in almost every application. For example, the powerful Internet search engine Google counts on this technology, but other applications e.g. E-Mail and text editors also use IR systems by providing the user the ability to receive response through keyword queries. In summary, the goal of IR “…is to connect the right information with the right users at the right time…” (Aggarwal and Zhai, 2012: 2).
Information Extraction Information Extraction (IE) is one of the more mature fields in text mining with the aim of constructing structured data from unstructured text (Miner et al., 2012: 37). With this technique, meaningful information can be extracted from large amount of text (Talib et al., 2016: 415). However, this cannot be done without great effort. Extracting data from large amount of text is not easy and requires special algorithms and softwares (Miner et al., 2012: 37). “IE systems are used to extract specific attributes and entities from the document and establish their relationship. The extracted corpus is stored in the database for further processing.” (Talib et al., 2016: 415).
Document Clustering According to Miner (2012: 959), clustering or cluster analysis is the oldest technology of text mining and was used by the military to document recovery systems during World War II. Today, clustering of documents is algorithms of DM used to group similar documents into clusters (ibid.: 36). The goal of clustering is to classify text documents into groups by applying different clustering algorithms (Talib et al., 2016: 416). Clustering is a method of unsupervised learning; no training is required, as it is the case with supervised learning. Unsupervised learning is not as powerful as supervised learning, but more versatile.
1 Introduction: Discusses the growing societal problem of white-collar crime and establishes the research question regarding appropriate data mining techniques for fraud detection.
2 White-Collar Crime: Provides an overview of white-collar crime definitions, management strategies, and introduces the Fraud Triangle by Cressey.
3 Types of White-Collar-Crimes: Examines specific categories such as credit card fraud, embezzlement, insolvency offences, and corruption to illustrate the financial impact of such crimes.
4 Data Mining, Text Mining and Big Data: Explains technical foundations, data formats, and how text mining transforms unstructured data into forms suitable for predictive machine learning models.
5 A case study on Credit Card Fraud Detection: The practical core of the thesis, detailing the application of various data mining algorithms to a real-world credit card dataset using the CRISP-DM model.
6 Conclusion: Summarizes key findings, answers the research question based on the empirical results, and provides recommendations for future research.
White-collar crime, Fraud detection, Data mining, Text mining, Big data, Machine learning, CRISP-DM, Credit card fraud, Classification, Unstructured data, Supervised learning, Hyperparameter optimization, Logistic regression, Support vector machine, Neural networks.
The thesis focuses on using intelligent IT approaches, specifically Data Mining and Text Mining, to identify and mitigate white-collar crimes within large datasets.
The study examines financial fraud, credit card fraud, healthcare fraud, embezzlement, criminal insolvency, and corruption.
The main question asks which data mining techniques are most appropriate for detecting white-collar crimes in structured and unstructured data.
The thesis utilizes a literature analysis based on Webster & Watson for the theory, and the CRISP-DM (Cross-Industry Standard Process for Data Mining) reference model for the empirical case study.
The main body covers the theoretical background of fraud, an introduction to big data and text mining techniques, and a detailed case study on credit card fraud detection.
Key concepts include supervised and unsupervised learning, sampling techniques for imbalanced datasets (e.g., SMOTE), and hyperparameter optimization to enhance predictive model accuracy.
It provides a structured, iterative framework (Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment) for conducting the credit card fraud detection project.
Text mining is essential for converting unstructured data—such as emails or documents—into a structured format (e.g., using Vector Space Model or TF-IDF) so that predictive machine learning algorithms can process them.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

