Fraud Detection in White-Collar Crime

Bachelorarbeit, 2017
93 Seiten, Note: 1.3

Informatik - Wirtschaftsinformatik

Leseprobe

1 Introduction

1.1 Motivation and problem statement

1.2 Research Methodology

1.3 Goal and structure of the thesis

2 White-Collar Crime

2.1 Fraud Management

2.1.1 Fraud Prevention

2.1.2 Fraud Detection

2.1.3 Fraud Investigation

2.2 Fraud Triangle

2.2.1 Opportunity

2.2.2 Incentive/Pressure

2.2.3 Rationalization/Attitude

3 Types of White-Collar-Crimes

3.1 Fraud

3.2 Credit Card Fraud

3.3 Healthcare Fraud

3.4 Embezzlement

3.5 Criminal Insolvency Offences

3.6 Corruption

4 Data Mining, Text Mining and Big Data

4.1 Introduction into Big Data

4.1.1 The 3 V’s of Big Data

4.1.2 Data Forms

4.2 Data Mining

4.2.1 Types of Machine Learning

4.2.2 Classification of Data Mining Applications

4.3 Text Mining

4.3.1 Practise areas of Text Mining

4.3.2 Example of feature extraction from unstructured data

4.4 Context of Data Mining and Text Mining in White-Collar Crime

5 A case study on Credit Card Fraud Detection

5.1 Overview

5.2 Data Exploration

5.3 Confusion Matrix Terminology

5.4 Algorithms and Techniques

5.4.1 Literature review on Data Mining Techniques

5.4.2 Selection of Data Mining Techniques

5.5 Sampling techniques

5.6 Train and Test Set

5.7 Imbalanced Data

5.7.1 Results on imbalanced Data

5.8 Undersampled Data

5.8.1 Results on undersampled Data

5.9 Oversampled Data

5.9.1 Results on oversampled Data

5.10 Oversampled Data with SMOTE

5.10.1 Results on Oversampled Data with SMOTE

5.11 Undersampled Data with Hyperparameters Optimization

5.11.1 Model Parameter and Hyperparameter

5.11.2 Hyperparameter optimization algorithms

5.11.3 Explanation of selected Hyperparameters

5.11.4 Cross-Validation

5.11.5 Selection of Hyperparameter Optimization Algorithm and k-fold CV

5.11.6 Results on Undersampled Data with Hyperparameter Optimization

5.12 Review of the case study: Credit Card Fraud Detection

6 Conclusion

Research Goal and Focus Areas

The primary research objective of this thesis is to determine which data mining techniques are effective for detecting fraudulent activities in both structured and unstructured datasets, specifically within the context of white-collar crime. The study bridges the gap between theoretical crime analysis and practical technical implementation.

White-collar crime dynamics and the fraud triangle theory.
Data mining and text mining applications for fraud detection.
Big data management and unstructured data transformation.
Evaluation of machine learning algorithms through a credit card fraud case study.

Excerpt from the Book

4.3.1 Practise areas of Text Mining

Information Retrieval (IR) The main task of Information Retrieval (IR) is not to analyse the data, but to index, search and retrieve documents from large text databases with keyword queries (Miner et al., 2012: 36). At the present time, IR systems are used in almost every application. For example, the powerful Internet search engine Google counts on this technology, but other applications e.g. E-Mail and text editors also use IR systems by providing the user the ability to receive response through keyword queries. In summary, the goal of IR “…is to connect the right information with the right users at the right time…” (Aggarwal and Zhai, 2012: 2).

Information Extraction Information Extraction (IE) is one of the more mature fields in text mining with the aim of constructing structured data from unstructured text (Miner et al., 2012: 37). With this technique, meaningful information can be extracted from large amount of text (Talib et al., 2016: 415). However, this cannot be done without great effort. Extracting data from large amount of text is not easy and requires special algorithms and softwares (Miner et al., 2012: 37). “IE systems are used to extract specific attributes and entities from the document and establish their relationship. The extracted corpus is stored in the database for further processing.” (Talib et al., 2016: 415).

Document Clustering According to Miner (2012: 959), clustering or cluster analysis is the oldest technology of text mining and was used by the military to document recovery systems during World War II. Today, clustering of documents is algorithms of DM used to group similar documents into clusters (ibid.: 36). The goal of clustering is to classify text documents into groups by applying different clustering algorithms (Talib et al., 2016: 416). Clustering is a method of unsupervised learning; no training is required, as it is the case with supervised learning. Unsupervised learning is not as powerful as supervised learning, but more versatile.

Summary of Chapters

1 Introduction: Discusses the growing societal problem of white-collar crime and establishes the research question regarding appropriate data mining techniques for fraud detection.

2 White-Collar Crime: Provides an overview of white-collar crime definitions, management strategies, and introduces the Fraud Triangle by Cressey.

3 Types of White-Collar-Crimes: Examines specific categories such as credit card fraud, embezzlement, insolvency offences, and corruption to illustrate the financial impact of such crimes.

4 Data Mining, Text Mining and Big Data: Explains technical foundations, data formats, and how text mining transforms unstructured data into forms suitable for predictive machine learning models.

5 A case study on Credit Card Fraud Detection: The practical core of the thesis, detailing the application of various data mining algorithms to a real-world credit card dataset using the CRISP-DM model.

6 Conclusion: Summarizes key findings, answers the research question based on the empirical results, and provides recommendations for future research.

Keywords

White-collar crime, Fraud detection, Data mining, Text mining, Big data, Machine learning, CRISP-DM, Credit card fraud, Classification, Unstructured data, Supervised learning, Hyperparameter optimization, Logistic regression, Support vector machine, Neural networks.

Frequently Asked Questions

What is the core focus of this bachelor thesis?

The thesis focuses on using intelligent IT approaches, specifically Data Mining and Text Mining, to identify and mitigate white-collar crimes within large datasets.

Which specific crime types are addressed in the study?

The study examines financial fraud, credit card fraud, healthcare fraud, embezzlement, criminal insolvency, and corruption.

What is the primary research question?

The main question asks which data mining techniques are most appropriate for detecting white-collar crimes in structured and unstructured data.

Which scientific methodology is applied?

The thesis utilizes a literature analysis based on Webster & Watson for the theory, and the CRISP-DM (Cross-Industry Standard Process for Data Mining) reference model for the empirical case study.

What is covered in the main body of the work?

The main body covers the theoretical background of fraud, an introduction to big data and text mining techniques, and a detailed case study on credit card fraud detection.

What are the key technical concepts described in the work?

Key concepts include supervised and unsupervised learning, sampling techniques for imbalanced datasets (e.g., SMOTE), and hyperparameter optimization to enhance predictive model accuracy.

Why is the CRISP-DM model relevant to this research?

It provides a structured, iterative framework (Business Understanding, Data Understanding, Data Preparation, Modelling, Evaluation, and Deployment) for conducting the credit card fraud detection project.

How does text mining contribute to the fraud detection process?

Text mining is essential for converting unstructured data—such as emails or documents—into a structured format (e.g., using Vector Space Model or TF-IDF) so that predictive machine learning algorithms can process them.

Ende der Leseprobe aus 93 Seiten - nach oben

Details

Titel: Fraud Detection in White-Collar Crime
Hochschule: Hochschule Heilbronn Technik Wirtschaft Informatik
Note: 1.3
Autor: Rohan Ahmed (Autor:in)
Erscheinungsjahr: 2017
Seiten: 93
Katalognummer: V426831
ISBN (eBook): 9783668738348
ISBN (Buch): 9783668738355
Dateigröße: 3304 KB
Sprache: Englisch
Schlagworte: fraud detection white-collar crime
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 34,99
Preis (Book): US$ 46,99

Arbeit zitieren: Rohan Ahmed (Autor:in), 2017, Fraud Detection in White-Collar Crime, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/426831