Masterarbeit, 2014
98 Seiten, Note: 1,0
1. INTRODUCTION AND PROBLEM DESCRIPTION
1.1 INTENTION OF THIS THESIS
1.2 PROCEEDING
2. INTRODUCTION TO KEY FIGURE ANALYSIS
2.1 THE PRINCIPLE OF KEY FIGURES
2.2 THE CLASSICAL KEY FIGURE ANALYSIS APPROACH
2.3 MODERN KEY FIGURE ANALYSIS APPROACHES
2.4 LIMITATIONS OF ANNUAL REPORT ANALYSIS
3. THE AVAILABLE DATASET
3.1 DESCRIPTION OF THE DATASET
3.2 DATA CLEAN-UP
4. KEY FIGURE SELECTION
4.1 SIGNIFICANT KEY FIGURE REQUIREMENTS
4.2 THE SELECTED KEY FIGURES OF THIS ANALYSIS
4.2.1 Selected class variable
4.2.2 Selected qualitative key figures
4.2.3 Selected absolute key figures
4.2.4 Selected relative key figures
4.3 CLASS ANALYSIS
5. CLASSIFICATION TREES AND FORESTS
5.1 PRECONSIDERATIONS
5.2 CLASSIFICATION TREES
5.2.1 A simple example
5.2.2 Generation of classification trees
5.2.3 Pruning an existing tree
5.2.4 Relevant properties of CART trees
5.3 RANDOM FOREST
5.3.1 Classification process of a random forest
5.3.2 Generation of random forest
5.3.3 Relevant properties of random forests
6. CLASSIFICATION RESULTS
6.1 CLASSIFICATION TREE RESULTS
6.1.1 Examination of the most precise tree
6.1.2 Key indicator importance ranking
6.1.3 Transfer to data from 2011
6.2 CLASSIFICATION FOREST RESULTS
6.2.1 Transfer to data from 2011
6.2.2 Key indicator importance ranking
7. CONCLUSION
7.1 CRITICAL ASSESSMENT
7.2 OUTLOOK
The primary objective of this thesis is to evaluate whether stakeholders can utilize classification trees and random forests to predict exceptionally growing German firms at the beginning of a calendar year, based on annual report key figures from previous years. The research addresses the challenge of analyzing large, complex datasets by implementing a data mining approach based on the CRISP-DM reference model.
2.4 Limitations of annual report analysis
At the end of this chapter it is important to point out important general aspects of analysing annual statement data because these facts directly influence the quality of the created model.
First of all, annual reports are not originally designed to be used as a foundation for predicting growth but rather concern the past by telling how wealthy the company is and why its assets has changes. This means that the annual report is diverted from its intended use (Franken 2007, 3).
Another problem, especially in context of small and middle-size companies, is that their success strongly depends on the manager of this company. Unfortunately, most used datasets do not contain any information like age, gender and education of this person (Anders und Szczesny 1999, 1-2).
Furthermore, there is often no information about the enterprise’s strategic goals, its capability to be innovative, the professionalism of the manager and his staff, and the customer focus. All these aspects influence whether a company is going to be successful but cannot be used because they are either not available at all or very hard to operationalize and, therefore, require controversial generalisations (Moro und Schäfer 2004, Fritz 1993, 1, Feldo 2011, 8).
1. INTRODUCTION AND PROBLEM DESCRIPTION: Introduces the research context, the importance of predictive models for finance, and the application of the CRISP-DM methodology for data mining.
2. INTRODUCTION TO KEY FIGURE ANALYSIS: Examines the principles, advantages, and shortcomings of traditional key figure analysis, contrasting them with modern data mining approaches.
3. THE AVAILABLE DATASET: Details the structure and content of the "Amadeus" database used for the analysis, including necessary steps for data cleaning and preparation.
4. KEY FIGURE SELECTION: Discusses the criteria for selecting meaningful financial indicators and defines the class variables used for identifying lucrative firms.
5. CLASSIFICATION TREES AND FORESTS: Provides a detailed explanation of the CART algorithm, the generation of classification trees, random forests, and methods for pruning and variable importance estimation.
6. CLASSIFICATION RESULTS: Presents the findings of the classification tasks, comparing the performance of different models on 2010 data and their predictive capability when transferred to 2011.
7. CONCLUSION: Summarizes the key insights, evaluates the effectiveness of the chosen DM approach, and provides an outlook on future potential improvements.
Data Mining, Classification Trees, Random Forest, Financial Statements, Annual Reports, Predictive Modeling, Lucrativeness, Key Figures, CRISP-DM, Corporate Growth, German Firms, Big Data, R, CART, Model Performance
The research focuses on utilizing data mining techniques—specifically classification trees and random forests—to predict which companies will achieve exceptional growth based on historical financial statement data.
The central themes include financial key figure analysis, classification algorithms, data processing of large financial datasets, and the practical application of the CRISP-DM methodology.
The main question is whether a stakeholder can effectively use classification trees or random forests to predict the lucrativeness of German firms at the start of a year, using data from previous years.
The study employs a supervised learning data mining approach, specifically using the CART algorithm and Random Forests, structured within the CRISP-DM lifecycle.
The main section covers the selection and justification of financial key figures, the technical generation and pruning of classification trees, the creation of random forests, and a rigorous performance evaluation of these models on real-world datasets.
Key terms center around predictive analytics in a corporate finance context, focusing on quantitative metrics, model accuracy, and the ability to interpret model results for decision-makers.
Random Forests are used because they often provide better classification performance and variance reduction compared to individual "weak learner" trees, though they offer less transparency.
The research leverages the built-in capabilities of classification trees for handling missing data and discusses specific imputation methods for random forests to ensure the large "Amadeus" dataset remains usable.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

