Bachelorarbeit, 2020
74 Seiten, Note: 1.1
The main objective of this thesis is to develop a method for automatically deriving hierarchical structures from data, specifically focusing on property graph models used in graph databases. The research explores various clustering algorithms and feature extraction techniques to achieve this goal, considering the challenges posed by different data representations within graph structures.
1 Introduction: This chapter introduces the problem of automatically deriving hierarchical structures from data, particularly within the context of property graph databases which lack inherent tools for representing such hierarchies. It lays the groundwork for the thesis by highlighting the importance of this research and outlining the approach taken.
2 Background: This chapter provides necessary background information on key concepts relevant to the thesis. It covers the property graph model, various approaches to cluster analysis (including hierarchical, partition-based, and density-based methods), taxonomy as a structured representation of hierarchies, and relevant concepts from probability theory, all of which are foundational to the proposed methods.
3 Algorithms: This chapter delves into a detailed exploration of several hierarchical, partition-based, density-based, and model-based clustering algorithms. It provides a comprehensive overview of their functionalities, strengths, and limitations, laying the theoretical foundation for the algorithm selection and adaptation processes employed in later chapters. Particular focus is given to Cobweb, a conceptual clustering algorithm, and its extensions, highlighting their potential for the problem at hand.
4 Label Inference: This chapter presents the proposed solution for automatically inferring hierarchical structures from data represented as property graphs. It details the pre-processing steps involved, including encoding sets of tags as vectors and extending the feature vectors to capture graph structure. The core of this chapter focuses on the clustering process and the subsequent post-processing steps for extracting a taxonomy from the results.
5 Evaluation: This chapter presents the results of the proposed method's evaluation through two different setups: tag-based clustering and graph-aware clustering of nodes. For each, data sets, implementation details, and results are discussed, focusing on how the selected algorithm performed and what insights they provide into the effectiveness of the proposed approach for different data characteristics and structures.
Property graph, hierarchical clustering, feature extraction, graph databases, cardinality estimation, Cobweb, taxonomy, data representation, label inference, adaptive methodology.
This document provides a comprehensive preview of a thesis focused on automatically deriving hierarchical structures from data, specifically within property graph databases. It details the methodology, algorithms used, evaluation process, and key findings.
The main objective is to develop a method for automatically creating hierarchical structures from data represented in property graph models. This involves exploring various clustering algorithms and feature extraction techniques to handle different data representations within graph structures, ultimately aiming for application in cardinality estimation within property graph databases.
Key themes include automatic hierarchy derivation from data, hierarchical clustering algorithms for graph data, feature extraction for graph-based clustering, an adaptive methodology for different data representations, and the application to cardinality estimation in property graph databases.
The thesis explores several clustering algorithms, categorized as hierarchical (including hierarchical agglomerative clustering and robust single linkage), partition-based (K-Means and TTSAS), density-based (DBSCAN, OPTICS, and HDBSCAN), and model-based/conceptual clustering (Cobweb and its extensions).
Feature extraction plays a crucial role in preparing the data for clustering. The thesis investigates techniques such as characteristic sets and recursive feature extraction to effectively represent the data's structure and properties for optimal clustering performance.
The methodology is designed to be adaptive to different data representations within graph structures. Pre-processing steps, such as encoding sets of tags as vectors and extending feature vectors, are used to ensure the chosen clustering algorithms can effectively handle various data formats within the property graph model.
The proposed solution involves a three-stage process: pre-processing (encoding tags as vectors and extending feature vectors), clustering using the selected algorithm (based on data characteristics), and post-processing to extract a taxonomy from the clustering results.
The evaluation is conducted using two setups: tag-based clustering and graph-aware clustering of nodes. Each setup involves specific datasets, implementation details, and results analysis, focusing on algorithm performance and insights into the effectiveness of the proposed approach for different data characteristics and structures.
While the specific algorithm selection depends on the data characteristics, the thesis deeply explores several algorithms, particularly focusing on Cobweb and its extensions for conceptual clustering. The final choice of algorithm is determined and justified within the context of the evaluation section.
The evaluation chapter provides a detailed analysis of the performance of the chosen clustering algorithm(s) under different conditions (tag-based vs. graph-aware clustering). This analysis highlights the strengths and limitations of the proposed method and provides insights into its applicability to various types of graph data and their properties.
Key words include property graph, hierarchical clustering, feature extraction, graph databases, cardinality estimation, Cobweb, taxonomy, data representation, label inference, and adaptive methodology.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!
Kommentare