Bachelorarbeit, 2022
162 Seiten, Note: 1.0
1 Introduction
1.1 Initial situation
1.2 Problem description
2 Scope of the thesis
3 Theory
3.1 Malware
3.1.1 Definition
3.1.2 Malware Evolution
3.1.3 Types of malware
3.2 Program architecture in Microsoft Windows (MW)
3.2.1 The Portable Executable file format
3.2.2 Relevant insights for malware analysis
3.3 Malware Detection
3.3.1 Methodologies
3.3.2 Evading detection by Obfuscation
3.4 Machine Learning (ML)
3.4.1 Definition
3.4.2 Features
3.4.3 ML-Workflow
3.4.4 ML-Paradigms
3.4.5 ML-Algorithms
3.4.6 Model accuracy and metrics
4 Literature Review: ML Approaches in Research
4.1 Review outline
4.1.1 Structure
4.1.2 Literature overview
4.1.3 Evaluation criteria
4.2 Malware Feature Taxonomy
4.3 Static ML approaches
4.3.1 Quantitative evaluation
4.3.2 Qualitative evaluation
4.4 Dynamic ML approaches
4.4.1 Quantitative evaluation
4.4.2 Qualitative evaluation
4.5 Hybrid ML approaches
4.5.1 Quantitative evaluation
4.5.2 Qualitative evaluation
4.6 Conclusive learning from literature
5 Practical Review: Implementing a Static ML-Based Malware Detector
5.1 Safety measures and disclaimer
5.2 Requirements and resources
5.2.1 Test-environment: Guest OS and host OS
5.2.2 PE file repository: VirusShare and EMBER
5.2.3 Feature extraction: Python PEpper
5.2.4 Model training and validation: WEKA
5.3 Implementation
5.3.1 Phase 1: Data gathering
5.3.2 Phase 2: Data preparation
5.3.3 Phase 3: Model training
5.3.4 Phase 4: Model validation
5.4 Conclusive learning from practical implementation
6 Conclusion and Outlook
6.1 Conclusion
6.2 Outlook
The main objective of this thesis is to evaluate and compare static, dynamic, and hybrid machine learning approaches for detecting malware on Windows systems. The work bridges the gap in current research where these methods are rarely compared comprehensively, ultimately demonstrating the practical implementation of a static malware detector using common machine learning workflows.
3.3.2 Evading detection by Obfuscation
The de-obfuscation of malware scripts, i.e. the reverse engineering and reconstruction of the actual intent of the code, is humorously described by Barker as "Putting the toothpaste back in the tube" (cp. Barker 2021, p. 293). Basically, this illustrates that malware developers have several methods at their disposal to modify their code (or even let it modify itself) in such a way that it is difficult to trace the original state and thus also to reveal the actual purpose of the code.
The following are some examples of common approaches on obfuscation:
Encryption: Malware sometimes employs encryption to hide malicious code blocks throughout its whole code (see figure 12). As a result, the malicious code contained in that malware may be undetectable by the host (cf. Wardle 2022, p. 285; cf. Aslan & Samet 2020, p. 6251). Therefore, methods have been created - and will be applied in many forthcoming studies -, such as measuring the "Entropy" of a potentially dangerous file, to be able to at least verify the existence of encrypted or compressed chunks of code inside executable files (cf. Gibert & Mateu & Planes 2020, p. 8).
1 Introduction: This chapter introduces the ongoing conflict between security experts and malware developers, highlighting the limitations of traditional signature-based detection and the growing significance of machine learning.
2 Scope of the thesis: This section defines the primary goal of evaluating static, dynamic, and hybrid approaches and outlines the three core research questions guiding the analysis and practical implementation.
3 Theory: This chapter establishes the fundamental terminology regarding malware evolution, Windows file structures, detection methodologies (static, dynamic, hybrid), and foundational machine learning concepts.
4 Literature Review: ML Approaches in Research: This section provides a comprehensive, criteria-based evaluation of 35 diverse research papers, classifying them by methodology and analyzing their performance through quantitative and qualitative metrics.
5 Practical Review: Implementing a Static ML-Based Malware Detector: This chapter details the hands-on process of building a static detector, covering environment setup, feature extraction with Python, and model training/validation using WEKA.
6 Conclusion and Outlook: This final chapter synthesizes the main findings from the literature review and the practical implementation, providing insights into future developments for malware detection.
Malware Detection, Machine Learning, Static Analysis, Dynamic Analysis, Hybrid Analysis, Windows Executables (PE), Feature Engineering, Malware Evolution, Obfuscation, Model Accuracy, WEKA, Classification, Cybersecurity, Neural Networks, Endpoint Protection
This thesis examines the effectiveness and common characteristics of different machine learning-based malware detection methods on the Windows platform, specifically focusing on static, dynamic, and hybrid approaches.
The research is categorized into static analysis (analyzing files without execution), dynamic analysis (monitoring behavior at runtime), and hybrid analysis (combining both methods).
The work addresses how these three methodological domains differ in terms of quantitative and qualitative evaluation criteria and how a static detection model can be practically implemented using established ML workflows.
The thesis utilizes the framework for literature reviewing developed by Brocke et al., which involves defined selection cycles and key performance indicators to ensure a representative and rigorous overview of existing research.
The practical section describes the end-to-end development of a static malware detector, including data gathering from repositories like VirusShare and EMBER, feature extraction using Python, and model validation with the WEKA tool.
The most important terms include Malware Detection, Machine Learning, Static Analysis, Dynamic Analysis, Hybrid Analysis, Malware Evolution, and Feature Engineering.
Hybrid approaches are complex because they require the integration of feature vectors from two distinct sources (static and dynamic), necessitating expert knowledge in feature fusion and additional effort during the feature extraction phase.
Yes, the implementation confirmed that static models generally struggle to correctly classify obfuscated or encrypted malware samples, a limitation that was repeatedly noted in the surveyed research literature.
The custom feature graph, developed using the tool Gephi, visualizes the relationships between different methodology classes and applied features, providing a clearer understanding of how these elements relate to each other across different studies.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

