Doktorarbeit / Dissertation, 2020
134 Seiten
Chapter 1. Introduction to Data Mining and Decision Support
1.1 Introduction
1.2 The KDD Process
1.2.1 Developing and understanding of the application domain
1.2.2 Selecting and creating a data set
1.2.3 Pre-processing and cleansing
1.2.4 Data transformation
1.2.5 Choosing the appropriate Data Mining task
1.2.6 Choosing the Data Mining algorithm
1.2.7 Employing the Data Mining algorithm
1.2.8 Evaluation
1.2.9 Using the discovered knowledge
1.3 The Data Mining, a Step of the KDD Process
1.3.1 Database, Data Warehouse, or Other Information Repositories
1.3.2 Database or Data Warehouse Server
1.3.3 Knowledge Base
1.3.4 Data Mining Engine
1.3.5 Pattern Evaluation Module
1.3.6 Graphical User Interface
1.4. Data Mining Functionalities
1.4.1 Concept / Class Description: Characterization and Discrimination
1.4.2 Association Rule Mining
1.4.3 Classification and Prediction
1.4.4 Clustering
1.4.5 Outlier Analysis
1.4.6 Evolution Analysis
1.5 Common Uses of Data Mining
1.6 Decision Support
1.6.1 Basic Discipline
1.6.2 Decision Making
1.6.3 Classification of decision problems
1.6.4 Decision Support System
1.7 Contributions of This Thesis
Chapter 2. A Survey of Existing Work and Problem Definition
2.1 A Survey of existing work
2.1.1 Problem with integration of Data Mining and Decision Support
2.1.2 Evolution of Decision Support System (DSS)
2.1.3 A Survey of existing decision tree algorithm (Traditional)
2.1.4 A Survey of existing decision tree algorithm (Advanced)
2.2 Problems yet to be solved
2.3 Problem Definition
2.4 Research hypothesis, aims and objectives
2.5 Conceptual Research Framework
2.6 Conclusion
Chapter 3. Analysis of Data Mining Methods
3.1 Data Mining Methods
3.2 Discovery Method
3.3 Flat versus hierarchical classification
3.4 Basic Methods
3.5 Hierarchical classification
3.5.1 Why to choose hierarchies
3.5.2 Advantages of hierarchies
3.6 Machine Learning and Classification
3.6.1 Classification
3.6.1.1 Evaluation of classification methods
3.7 Classification Based on decision tree
3.8 Classification rules
3.9 The Pruning of Decision Tree
3.9.1 Types of Pruning Technique
3.9.1.1 Pre-Pruning
3.9.1.2 Post- Pruning
3.9.2 Fuzzy Decision Trees
3.10 Conclusion
Chapter 4. Decision tree Techniques and their formulation
4.1 Formulation of decision trees
4.2 Characteristics of Classification Trees
4.2.1 Tree Size
4.2.2 The hierarchical nature of decision trees
4.3 Basic concept and algorithm of Decision Tree
4.3.1 ID3
4.3.1.1 Attribute Selection
4.3.1.2 Information Gain
4.3.2 C4.5
4.3.3 CART
4.3.4 CHAID
4.3.5 QUEST
4.4 Advantages and Disadvantages of Decision Trees
4.5 Decision Tree Extensions
4.5.1 Oblivious Decision Trees
4.5.2 Fuzzy Decision Trees
4.6 Decision Trees Inducers for Large Datasets
4.7 Incremental Induction
4.8 Evaluation of Decision Tree Techniques
4.8.1 Generalization Error
4.8.1.1 Theoretical Estimation of Generalization Error
4.8.1.2 Empirical Estimation of Generalization Error
4.8.2 Confusion Matrix
4.8.3 Computational Complexity
4.8.4 Comprehensibility
4.9 Scalability to Large Datasets
4.9.1 Robustness
4.10 Conclusion
Chapter 5. The Development of New Algorithm for Decision Tree Learning
5.1 Proposed Improved ID3 Algorithm
5.2 Steps of Improved ID3 Algorithm
5.3 Pseudocode of Proposed Improved Algorithm
5.4 Experimental Example
5.5 Experiments on Datasets
5.6 Investigation and analysis based on performance parameters
5.6.1 Accuracy
5.6.2 Model Build Time
5.6.3 Predictor Error Measures
5.7 Empirical comparison and Investigation results
5.8 Conclusion
Chapter 6. Decision Support Framework and Related Work
6.1 Introduction
6.2 Proposed Decision Support Framework
6.3 Real world applications
6.3.1 Predicting Usage of Library Books
6.3.2 Intrusion Detection
6.3.3 Machine Learning
6.3.4. Diagnosis
6.3.5 Banking Sector
6.3.6 Credit Risk Analysis
6.4 Decision Tree Construction Using Weka
6.5 Weka Screen Shot
6.6 Conclusion
Chapter 7. Conclusions and Future Work
7.1 Summary and Contributions
7.2 Limitations and Future Work
7.3 Future Work
This thesis aims to address the limitations of the traditional ID3 decision tree algorithm by developing an improved version that utilizes impact factors and classified impact factors to resolve conflicts when attributes have equal values but belong to different classes. The primary research goal is to propose an algorithm that enhances classification accuracy while maintaining computational efficiency for real-world decision support systems.
1.2.6. Choosing the Data Mining algorithm
As per the approach, we now settle on the strategies. This phase includes choosing the particular method to be used for searching patterns (including multiple inducers). For instance, in with precision versus understand ability. The previous is healthier with neural networks, while the second is healthier with decision trees. For every strategy of meta-learning there are several possibilities to grasp how it may be accomplished. Meta-learning emphases on elucidating on causes of data Mining algorithm to achieve success or not during a precise problem. Thus, this approach attempts to recognise the circumstances under which a data Mining algorithm is most fitted. Each algorithm has parameters and techniques of learning (such as ten-fold cross-validation or another division for training and testing).
Chapter 1. Introduction to Data Mining and Decision Support: Provides an overview of the data mining process, the KDD workflow, and the integration of these techniques within decision support systems.
Chapter 2. A Survey of Existing Work and Problem Definition: Reviews literature on data mining and decision support, identifies research gaps, and formulates the core research hypothesis and objectives.
Chapter 3. Analysis of Data Mining Methods: Analyzes various data mining methods, focusing on the taxonomy, hierarchical vs. flat classification, and the importance of tree-based methods.
Chapter 4. Decision tree Techniques and their formulation: Details various decision tree algorithms, their mathematical formulations, and evaluation strategies for classification tasks.
Chapter 5. The Development of New Algorithm for Decision Tree Learning: Presents an improved ID3 algorithm that incorporates impact factors to handle attribute conflicts and validates its performance against traditional methods.
Chapter 6. Decision Support Framework and Related Work: Formulates a practical decision support framework and demonstrates its utility through real-world applications and Weka-based implementations.
Chapter 7. Conclusions and Future Work: Summarizes the thesis contributions, acknowledges limitations, and suggests potential directions for future research in hierarchical decision modeling.
Data Mining, KDD Process, Decision Tree, ID3, C4.5, CART, Decision Support System, Classification, Information Gain, Impact Factor, Hierarchical Classification, Attribute Selection, Weka, Machine Learning, Pruning.
The research focuses on enhancing decision tree classification by introducing an improved ID3 algorithm that utilizes impact factors and classified impact factors to resolve conflicts in attribute data.
The work covers themes such as the KDD process, the development of hierarchical multi-attribute decision models, decision tree pruning, and the implementation of decision support frameworks for real-world scenarios.
The primary goal is to improve the accuracy of decision tree learning and to provide a robust framework for effective decision support in scenarios where traditional algorithms fail to handle conflicting class attributes.
The research uses a mix of theoretical analysis of data mining methods and empirical evaluation, comparing the proposed improved ID3 algorithm against existing algorithms like C4.5 and CART using multiple real-world datasets.
The main chapters discuss the foundational concepts of data mining, survey existing algorithms, analyze classification methods, detail the development of the improved ID3 algorithm, and formulate a new decision support framework.
Key keywords include Data Mining, Decision Tree, ID3, Decision Support System, Classification, Information Gain, and Hierarchical Classification.
The proposed algorithm resolves the conflict of class selection when attributes have equal values by introducing an "impact factor" and "classified impact factor," allowing the algorithm to decide more effectively which class to adopt for maximum accuracy.
Impact factors are introduced to balance classification decisions based on the importance of attributes rather than relying solely on information gain, which helps the algorithm handle ambiguous data scenarios more accurately.
The performance is validated using 10-fold cross-validation on six different real-world datasets, measuring parameters like accuracy, model build time, and mean absolute error in comparison with existing ID3, C4.5, and CART models.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

