Masterarbeit, 2012
133 Seiten
Chapter 1: Machine Learning- An Introduction
1.1 Introduction-- 1
1.2 Data mining and Machine Learning-- 1
1.3 What is Machine Learning?-- 2
1.3.1 What is learning ?-- 3
1.4 Learning Strategies-- 3
1.5 History of Machine learning-- 4
1.5.1 Early enthusiasm (1955-1965)-- 4
1.5.2 Dark ages (1962-1976) -- 4
1.5.3 Renaissance (1976-1988)-- 4
1.5.4 Maturity(1988-Present)-- 5
1.6 Applications of Machine Learning -- 5
1.7 Typical Taxonomy of Machine Learning -- 5
1.7.1 Supervised learning -- 5
1.7.2 Un Supervised Learning-- 7
Chapter 2: Semi Supervised Learning
2.1 Introduction to Semi Supervised Learning--10
2.2 Semi Supervised Learning--10
2.2.1 Semi Supervised Classification-- 11
2.2.2 Semi Supervised Clustering--13
2.2.3 Semi Supervised Feature Selection-- 14
2.3 Generative Models--14
2.3.1 Identifability --15
2.3.2 Model Correctness--17
2.3.3 EM Local Maxima --18
2.3.4 Cluster-and-Label--18
2.3.5 Fisher kernel for discriminative learning--18
2.4 Self-Training --18
2.5 Co-Training and Multi view Learning --19
2.5.1 Co-raining—19
2.5.2 Multiview Learning -- 22
2.6 Semi-supervised Learning Techniques--22
2.6.1 Transduction--22
2.6.2 Induction--23
2.7 Avoiding Changes in Dense Regions--24
2.7.1 Transductive SVMs (S3VMs)-- 24
2.7.2 Gaussian Processes--27
2.7.3 Information Regularization--28
2.7.4 Entropy Minimization--29
2.7.5 A Connection to Graph-based Methods?-- 29
2.8 Graph-Based Methods-- 29
2.8.1 Regularization by Graph-- 30
2.8.2 Graph Construction-- 36
2.8.3 Fast Computation-- 37
2.9 Theoretical observations-- 39
Chapter 3: SITNNC
3.1 System Study -- 44
3.2 Traditionally Existing methods-- 44
3.2.1 Graph Mincut-- 44
3.2.2 Spectral Graph Partitioning-- 46
3.2.3 Standard algorithm-3NNC-- 47
3.2.4 ID3 Decision Tree-- 54
3.3 What is Classification? -- 59
3.3.1 Model construction--60
3.3.2 Model usage--61
3.4 Proposed method-- 62
3.4.1 Algorithm -SI-TNNC--63
3.4.2 Algorithm -improved SI-TNNC--64
3.5 EXPERIMENTAL RESULTS FOR SI-TNNC-- 66
3.6 Using Leaders in SI-TNNC--67
3.6.1 Algorithm - Modified-Leaders-- 68
3.6.2 Algorithm - K-means--69
Chapter 4: Design and Results of Enhanced SITNNC
4.1 Do Humans do Semi-Supervised Learning?--70
4.1.1 Visual Object Recognition with Temporal Association-- 72
4.1.2 Infant Word-Meaning Mapping--73
4.1.3 Human Categorization Experiments --73
4.2 Properties of the Data Sets Used--74
4.2.1 UCI repository--74
4.2.2. Types of metadata for machine learning--75
4.2.3 Dynamic or usage metadata--78
4.2.4 Using metadata in ML research--79
4.2.5 Organizational and structural support for metadata acquisition and maintenance--81
4.3 Used Datasets--83
4.4 Results--100
Chapter5: Conclusion
This work aims to address the limitations of existing classification methods in scenarios with limited labeled training data by proposing a more scalable and efficient transductive classification approach. The central research objective is to develop the "Selective Incremental Approach for Transductive Nearest Neighbour Classifier" (SITNNC) and to improve its computational performance using the Leaders Algorithm.
3.4 PROPOSED METHOD
This Thesis presents the proposed transductive nearest neighbor classifier. First, motivation (or intuition) behind this transductive labeling scheme is informally explained, followed by a detailed formal description. The nearest neighbor classifier (NNC) classifies a given test pattern according to its nearest neighbor in the training set. This can be done in the following way also, where nearest neighbors in each class of training patterns are found separately, and then the nearest neighbor is found. Let x be the test d+(x), and in L- be d-(x). Then, class-label assigned to x is y¹ = +1 if d+(x) < d-(x), y¹ = −1 otherwise. One way pattern, let its nearest neighbor’s distance in L+ be to measure goodness (γ) of this assignment is,
This goodness measure γ(x, y¹) is called margin of x with respect to L. This is done in a similar way as functional margin is used for a hyper-plane in order to learn a SVM [52]. Intuitively, if margin is large the confidence in that labeling assignment is high. If labeling has to be done for the test set U collectively, then, suppose we assigned a labeling to all patterns in U to get U¹ = {(xl+1, y¹l+1), . . . , (xn, y¹n )}, then the goodness of this labeling is,
Chapter 1: Machine Learning- An Introduction: Provides a comprehensive overview of machine learning, its history, fundamental strategies, and taxonomy, contextualizing the need for automated knowledge discovery.
Chapter 2: Semi Supervised Learning: Examines the theoretical underpinnings of semi-supervised learning, contrasting it with supervised and unsupervised methods while detailing various techniques such as self-training and graph-based approaches.
Chapter 3: SITNNC: Introduces the proposed Selective Incremental Approach for Transductive Nearest Neighbour Classifier, detailing its methodology, system study, and integration with the Leaders Algorithm for optimization.
Chapter 4: Design and Results of Enhanced SITNNC: Discusses the experimental setup, the properties of the used datasets from the UCI repository, and presents the comparative performance results of the proposed method.
Chapter5: Conclusion: Summarizes the contributions of the thesis, highlighting the effectiveness of the SI-TNNC method in reducing computational complexity and suggesting directions for future research.
Semi-supervised learning, Transductive inference, Nearest Neighbour Classifier, SITNNC, Graph Mincut, Leaders Algorithm, Machine learning, Classification, Clustering, Feature selection, UCI repository, Computational complexity, Decision Trees, Inductive learning, Pattern recognition.
The research focuses on semi-supervised learning and specifically proposes an enhanced transductive nearest neighbor classifier to handle scenarios where labeled data is scarce.
The work covers machine learning taxonomy, semi-supervised learning techniques, graph-based methods, and the practical application of classification algorithms in real-world data mining scenarios.
The goal is to provide a more scalable and computationally efficient alternative to existing transductive classification methods by utilizing an incremental selective approach.
The author uses a heuristic-based transductive classification approach, further optimized by the "Leaders Algorithm" to reduce time complexity from O(n³) to O(n²).
The main body details the theoretical background of semi-supervised learning, reviews existing classification algorithms, introduces the SITNNC method, and validates it through empirical testing on standard datasets.
Key terms include semi-supervised learning, transductive inference, SITNNC, Leaders algorithm, nearest neighbor classification, and time complexity optimization.
SITNNC focuses on a selective incremental approach to label unlabeled patterns, specifically aiming to reduce the O(n³) time complexity associated with traditional transductive methods like Graph Mincut.
The UCI repository serves as the source for the five standard datasets used to validate and benchmark the performance of the proposed SI-TNNC method against existing classifiers.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

