Doktorarbeit / Dissertation, 2014
201 Seiten, Note: summa cum laude
1. Introduction
1.1. Motivation
1.2. Contribution
1.2.1. Graph-Aware Join Enumeration Algorithms
1.2.2. Hypergraph-Aware Join Enumeration Algorithms
1.2.3. Branch-and-Bound Pruning
1.2.4. Conclusion and Appendix
2. Graph-Aware Join Enumeration Algorithms
2.1. Preliminaries
2.1.1. Graphs
2.1.2. Query Graphs, Plan Classes and Costs
2.1.3. Problem Specification
2.2. Basic Memoization
2.2.1. Generic Top-Down Join Enumeration
2.2.2. Naive Partitioning
2.2.3. Exemplified Execution of TDBASIC
2.2.4. Analysis of TDBASIC
2.3. Advanced Generate and Test
2.3.1. TDMINCUTAGAT
2.3.2. An Improved Connection Test
2.3.3. Correctness of Advanced Generate and Test
2.4. Conservative Graph Partitioning
2.4.1. Correctness Constraints
2.4.2. The Algorithm in Detail
2.4.3. Exploring Connected Components
2.5. Branch Partitioning
2.5.1. Branch Partitioning - An Overview
2.5.2. The Algorithm in Detail
2.5.3. Two Optimization Techniques
2.5.4. Exploring Restricted Neighbors
2.5.5. Two Examples
2.5.6. Complexity of Branch Partitioning
2.6. Evaluation
2.6.1. Experimental Setup
2.6.2. Organizational Overview
2.6.3. Experiments
3. Hypergraph-Aware Join Enumeration Algorithms
3.1. Motivation
3.2. Preliminaries
3.2.1. Hypergraphs
3.2.2. Graphs vs. Hypergraphs
3.2.3. Neighborhood
3.3. Basic Memoization for Hypergraphs
3.3.1. Generic Top-Down Join Enumeration for Hypergraphs
3.3.2. Naive Partitioning of Hypergraphs
3.3.3. Test for Connectedness of Hypergraphs
3.4. Conservative Partitioning for Hypergraphs
3.4.1. Overview of MINCUTCONSERVATIVEHYP
3.4.2. The Algorithm in Detail
3.4.3. Avoiding Duplicates
3.4.4. GETCONNECTEDCOMPONENTS
3.4.5. MAINTAINCSET
3.4.6. An Example
3.5. Generic Top-Down Join Enumeration for Hypergraphs
3.5.1. High-Level Overview
3.5.2. Structure of the Generic Partitioning Framework
3.5.3. Generating the Adjacency Information
3.5.4. Composing Compound Vertices
3.5.5. Economizing on Connection Tests
3.5.6. Efficient Subgraph Handling
3.5.7. Implementation Details
3.6. Evaluation
3.6.1. Implementation
3.6.2. Workload
3.6.3. Organizational Overview
3.6.4. Evaluation of Random Acyclic Query Graphs
3.6.5. Evaluation of Random Cyclic Query Graphs
3.6.6. Overhead Detection
3.6.7. Performance Evaluation with Different Benchmarks
4. Branch-and-Bound Pruning
4.1. Motivation
4.2. Accumulated-Cost Bounding and Predicted-Cost Bounding
4.2.1. Building a Join Tree
4.2.2. Accumulated-Cost Bounding
4.2.3. Predicted-Cost Bounding
4.2.4. Combining the Methods
4.2.5. An Example for Accumulated-Predicted-Cost Bounding
4.3. Technical Advances
4.4. Evaluation
4.4.1. Implementation
4.4.2. Workload
4.4.3. Performance Evaluation with Simple Query Graphs
4.4.4. Performance Evaluation with Complex Query Graphs
4.4.5. Performance Evaluation with Different Benchmarks
5. Conclusion
5.1. Graph-Aware Join Enumeration Algorithms
5.2. Hypergraph-Aware Join Enumeration Algorithms
5.3. Branch and Bound Pruning
5.4. Graceful Degradation
5.5. Summary
This thesis focuses on the process of top-down join enumeration in database management systems. Its primary objective is to develop efficient partitioning algorithms and advanced branch-and-bound pruning strategies that close the performance gap between top-down join enumeration and existing bottom-up approaches, while also addressing real-world query requirements like complex predicates and non-inner joins.
1.1. Motivation
Queries against DBMSs are often formulated in declarative languages. Prominent examples are SQL, OQL, XPath and XQuery. Writing such a declarative query has two advantages: (1) The querist does not need to decide upon the actual algorithms and execution order to access and combine the data, which in turn (2) leaves the DBMS with several degrees of freedom to choose the best evaluation and execution strategy in order to answer the query. This is a shift of complexity: from formulating the query towards how to answer it in a most efficient way. We refer to the process of transforming the declarative query in an imperative execution plan as plan generation, and we call the component in the DBMS which deals with the complexity of choosing a suitable plan from all alternatives the plan generator.
Today’s plan generators are cost-based. This means that they rely on a cost model and statistics over the data in order to select from all equivalent plans the one with the lowest costs. Essential for the costs of a plan is the execution order of join operations in its operator tree, since the runtime of plans with different join orders can vary by several orders of magnitude. An exhaustive search for an optimal solution over all possible operator trees is computationally infeasible. To decrease complexity, the search space must be restricted. For the optimization problem discussed in this thesis, the well-accepted connectivity heuristic is applied: We consider all possible bushy join trees, but exclude cross products from the search, presuming that all considered queries span a connected query graph. Thereby, a query graph is an undirected graph where join predicates span the edges between the relations referenced in the SQL query, i.e., a graph edge between R1 and R2 is introduced if there exists a join predicate involving attributes of R1 and R2.
1. Introduction: Discusses the motivation for efficient join enumeration in DBMSs and outlines the contribution of the thesis, including graph-aware and hypergraph-aware algorithms.
2. Graph-Aware Join Enumeration Algorithms: Introduces basic memoization and develops novel, efficient graph-aware partitioning strategies, concluding with performance comparisons.
3. Hypergraph-Aware Join Enumeration Algorithms: Extends the framework to handle hypergraphs, necessary for complex predicates and non-inner joins, proposing a generic partitioning framework.
4. Branch-and-Bound Pruning: Details advanced pruning techniques to significantly reduce plan generation time by curtailing the search space without sacrificing optimality.
5. Conclusion: Summarizes the thesis, highlighting the achievement of bridging the performance gap between top-down and bottom-up join enumeration through robust, efficient algorithms.
Join Enumeration, Query Optimization, Memoization, Hypergraphs, Branch-and-Bound Pruning, Graph Partitioning, DBMS, SQL, Cost-based Optimization, Query Execution Plans, Performance Evaluation, TPC-H, TPC-DS, SQLite.
The work addresses the computational complexity of finding the optimal execution order for join operations in database queries, particularly for complex queries that involve many relations and joins.
The research lies at the intersection of database system internal design, query optimization, algorithmic graph theory, and performance engineering.
The goal is to enable top-down join enumeration to perform as efficiently as bottom-up approaches, while leveraging the inherent advantages of top-down processing for branch-and-bound pruning.
The thesis utilizes graph partitioning, memoization techniques, and cost-based query optimization strategies, alongside extensive empirical performance testing using standardized benchmarks.
The main body focuses on developing new partitioning algorithms for simple graphs and hypergraphs, creating a generic framework for partitioning, and introducing seven specific advancements to branch-and-bound pruning.
The work is defined by the concepts of graph-aware and hypergraph-aware join enumeration, and the use of branch-and-bound pruning to drastically reduce compile time.
The generic framework is significant because it allows existing, highly optimized graph-partitioning algorithms to be adapted for complex hypergraphs, which are essential for modern query optimizers dealing with outer joins and complex predicates.
By drastically reducing the "compile time" of queries—especially for complex, large-scale workloads—the proposed algorithms allow database systems to generate optimal or near-optimal plans faster, leading to lower latency and higher throughput in real-world scenarios.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

