Masterarbeit, 2021
54 Seiten, Note: 2,0
1 Introduction
1.1 Problem Statement
1.2 Thesis Structure
2 Background
2.1 Code Analysis
2.1.1 Dynamic Code Analysis
2.1.2 Static Code Analysis
2.2 Code Property Graphs
2.2.1 Components and Structure
2.2.2 Generation Process
2.3 Incremental Parsing
3 Related Work
3.1 Graph-Based Code Analysis
3.1.1 Joern
3.1.2 ProgQuery
3.1.3 Codyze: Code Property Graphs for Security Assessments
3.2 Incremental Parsing
4 Improving the Construction Performance for Incremental Changes
4.1 Approach Overview
4.2 Concurrent Source Code Parsing
4.3 Change Classification
4.4 Performing In-Place Updates
4.4.1 FieldDeclaration Changes
4.4.2 FunctionDeclaration Changes
4.5 Graph Equality Checking Algorithm
5 Evaluation
5.1 Concurrent Source Code Parsing
5.1.1 Experiment Setup
5.1.2 Results
5.2 Incremental Graph Updates
5.2.1 Experiment Setup
5.2.2 Results
6 Conclusion
7 Future Work
7.1 Using a Listener System for Reference Resolution
7.2 Incremental AST Parsing
7.3 Parallelizing the Whole CPG Generation Process
This thesis addresses the performance limitations of generating Code Property Graphs (CPG) for large software projects. The primary research goal is to develop an incremental CPG generation process that reuses existing graph structures for unchanged code, thereby significantly reducing the computational overhead and feedback cycles required for security analysis in continuous integration pipelines.
4.4 Performing In-Place Updates
While Algorithm 4.1 takes care of identifying the relationship between nodes in the original and changed graph, the crucial point how this can be used to speed up CPG construction lies in the update handlers that have been mentioned several times now. Each of them is able to process a specific node type and determines which actions need to be taken in case a node has been added, deleted or replaced. If updates cannot be performed with sufficient confidence, a full pass execution is triggered.
How this change process looks in practice will be the demonstrated in this section, using FieldDeclaration and FunctionDeclaration handlers as examples. Of course we could implement additional handlers, but some of the node types require a disproportionate amount of computational overhead for in-place updates, limiting their usefulness: When a change affects RecordDeclarations, for example, this can have widespread consequences due to changes in the program’s type hierarchy. In such a case it makes much more sense to just execute all passes again rather than trying to selectively resolve imprecisions in the graph that resulted from the incremental change.
1 Introduction: Introduces the necessity of automated code analysis and sets the objective to optimize CPG generation for incremental software changes.
2 Background: Provides foundational knowledge on code analysis, CPG structures, and the concepts of incremental parsing.
3 Related Work: Reviews existing graph-based analysis tools and current approaches to incremental parsing in the context of static analysis.
4 Improving the Construction Performance for Incremental Changes: Details the core implementation of incremental updates, change classification, and the graph equality checking algorithm.
5 Evaluation: Presents the empirical results of benchmarking the proposed concurrent parsing and incremental update strategies against existing baseline implementations.
6 Conclusion: Summarizes the thesis findings, confirming the feasibility and performance benefits of the incremental CPG construction approach.
7 Future Work: Suggests potential future directions, including the transition to listener-based systems and integrating incremental AST parsers like tree-sitter.
Code Property Graph, CPG, Incremental Parsing, Static Code Analysis, Software Security, Abstract Syntax Tree, AST, Control Flow Graph, CFG, Data Flow Graph, DFG, Performance Optimization, Continuous Integration, Graph Isomorphism, In-place Updates
The thesis focuses on improving the efficiency of Code Property Graph (CPG) generation by enabling incremental updates rather than full re-generation for every code change.
It utilizes graph-based static code analysis to model syntactic and semantic relationships within software projects, specifically targeting security vulnerability detection.
The goal is to reduce the runtime of CPG generation from several minutes to seconds for incremental changes, facilitating faster feedback loops in development environments.
The research employs graph algorithm development, including a specialized node mapping and traversal algorithm (Algorithm 4.1), and conducts performance benchmarking on real-world Java source code repositories.
The main body covers the theoretical approach for incremental updates, methods for concurrent AST parsing, change classification, and the design of an automated testing framework for graph equality.
Key terms include CPG, Incremental Parsing, Static Code Analysis, Performance Optimization, and Graph Isomorphism.
FieldDeclaration changes are handled via update handlers that allow for re-routing incoming references to updated nodes without necessitating a full re-parse of the entire project.
It serves as a rigorous testing framework to ensure that incrementally produced graphs are identical to those generated from scratch, which is crucial for maintaining analysis accuracy.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

