Masterarbeit, 2021
103 Seiten, Note: 1.7
1. Introduction
1.1 Motivation
1.2 Problem Description
1.3 Contribution
1.4 Framework and Structure
2. Background
2.1 Use Case Description
2.2 Data Parallel Processing Workloads
2.2.1 Cloud Computing Scheduler
2.2.2 Task Dependencies
2.2.3 Recurrent Jobs
2.3 Graph Theory
2.3.1 Directed Acyclic Graphs
2.3.2 Definitions
2.4 Graph Embedding
2.4.1 GNN
2.5 Data Clustering and Classification
2.5.1 Rule-Based Classification
2.5.2 K-Means
2.5.3 DBSCAN
2.5.4 OPTICS
2.5.5 Clustering Evaluation
2.6 Linear Regression
2.7 Evaluation Metrics in Linear Models
3. Approach
3.1 Overview
3.2 Data Cleaning and Integration
3.2.1 Data Cleaning
3.2.2 DAGs Extraction
3.3 Graph Embedding and Clustering
3.3.1 Graph Embedding
3.3.2 Clustering
3.3.3 Runtime Prediction
4. Implementation
4.1 DAGs Extraction
4.1.1 Data Preprocessing
4.1.2 DAGs Objects
4.1.3 Target Variables Definition
4.2 Graph Embedding
4.2.1 Model Structure
4.2.2 Model Training
4.3 Data Clustering Techniques
4.3.1 Preprocessing
4.3.2 DBSCAN
4.3.3 OPTICS
4.4 Runtime Prediction
5. Evaluation and Discussion
5.1 Dataset
5.1.1 Exceptions Handling
5.1.2 Data Characteristics
5.2 Rule-Based Classification
5.2.1 Recurring Jobs Detection
5.2.2 Rule-Based Classification
5.2.3 Runtime Prediction
5.2.4 Evaluation
5.3 GNN Evaluation
5.4 OPTICS Clustering
5.5 K-means Clustering
5.5.1 Model Setup
5.5.2 Evaluation
5.6 DBSCAN Clustering
5.6.1 Runtime Prediction
5.6.2 Comparison with the Baseline
5.7 Limitations
5.8 Discussions
6. State of the Art
6.1 Graph Representation Learning
6.2 Runtime and Resource Usage Prediction
6.3 Reoccurring Jobs
6.4 Alibaba Cluster Dataset
7. Conclusion
7.1 Provision of the Code
7.2 Provision of the Datasets
This thesis aims to develop a framework for predicting the runtime of incoming data-parallel jobs in cloud environments. By leveraging the structural similarities of reoccurring jobs through Graph Neural Networks (GNN) and clustering techniques, the research seeks to improve prediction accuracy and enable better resource allocation.
3.3 Graph Embedding and Clustering
The generated DAGs enable further analysis of structural dependencies. The current thesis aims to cluster reoccurring jobs based on their similarity in structural properties of DAGs. In other words, we aim to form a cluster of reoccurring jobs. To this end, the graph should be represented in a fixed-size data structure. This step is mandatory, as many clustering machine learning models do not accept the discrete data type of DAGs as input. The embeddings should hold three main conditions: (1) they have to represent the structural features of DAGs and the information regarding node- and graph-level attributes. (2) The representations must be definable in Euclidean space to make the comparison for clustering models possible, and (3) the representation learning phase should be categorized in the inductive models since the graph embedding phase in real-world scenarios should be repeatable to enable generating of comparable embeddings.
1. Introduction: This chapter provides an overview of the motivation behind runtime prediction in cloud environments, defines the problem, and outlines the thesis structure.
2. Background: This chapter establishes the necessary domain knowledge, covering parallelism, graph theory, graph embedding (GNN), and clustering/classification techniques.
3. Approach: This chapter introduces the methodology for data cleaning, DAG extraction, GNN-based embedding, and the clustering-based runtime prediction framework.
4. Implementation: This chapter details the technical execution of the proposed approach, including data preprocessing, feature selection, GNN architecture, and clustering setup.
5. Evaluation and Discussion: This chapter presents the statistical analysis of the dataset, evaluates the performance of the proposed methods against baselines, and discusses the results and limitations.
6. State of the Art: This chapter reviews existing literature related to graph representation learning, cloud resource prediction, and the analysis of the Alibaba cluster dataset.
7. Conclusion: This chapter summarizes the research findings, contributions, and potential areas for future work regarding job dependency embeddings.
Cloud Computing, Data-Parallel Jobs, Runtime Prediction, Directed Acyclic Graphs (DAGs), Graph Neural Networks (GNN), Graph Embedding, Data Clustering, DBSCAN, OPTICS, Resource Allocation, Alibaba Cluster Trace, Job Dependencies, Machine Learning, Runtime Estimation, Feature Selection
The thesis focuses on predicting the runtime of data-parallel jobs in cloud environments by identifying and clustering reoccurring job patterns based on their structural dependencies.
The central themes are cloud workload management, graph representation learning, unsupervised clustering techniques, and empirical evaluation using large-scale production traces.
The primary goal is to improve the accuracy of job runtime predictions by analyzing the Directed Acyclic Graph (DAG) structure of tasks, thereby assisting cloud providers in efficient resource scheduling.
The study employs Graph Neural Networks (GNN) for embedding DAGs, clustering algorithms such as DBSCAN and OPTICS to identify job groups, and linear regression models to predict runtime per cluster.
The main body covers the extraction of DAGs from raw trace data, the design of a GNN model to create fixed-size vector representations, and a comparative study of different clustering and prediction strategies.
Key terms include Cloud Computing, Graph Neural Networks, Runtime Prediction, DAGs, Data Clustering, and Resource Utilization.
Recurring jobs are identified as tasks sharing isomorphic structural dependencies (DAGs), allowing the framework to group similar operations together even if the input datasets vary.
The proposed framework demonstrates a significant improvement in accuracy, specifically reducing the mean absolute error of baseline predictions by 27%.
The dataset is used as a real-world benchmark with over 200,000 jobs to evaluate the robustness and reliability of the proposed dependency-based prediction approach.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

