Using Subsequence Mining to Identify Business Processes in Data Networks

Masterarbeit, 2016
63 Seiten

Informatik - Wirtschaftsinformatik

Leseprobe

1. Introduction

2. Motivation

2.1. Background

2.2. Related Work

2.3. Challenges

2.4. Requirements

3. Network Service Dependency Discovery

3.1. Technical Description

3.1.1. Network Model

3.1.2. Network Dependency Analysis

3.1.3. Business Processes

3.2. Implementation

3.2.1. Flow Extraction

3.2.2. Subnetwork Identifier

3.2.3. Potential Indirect Dependency Generator

3.2.4. Indirect Dependency Calculator

3.2.5. Stream-based MONA

3.2.6. Representation of Tasks

4. Evaluation

4.1. Ground Truth

4.1.1. Threshold Estimation

4.1.2. Validation of Threshold

4.2. Comparative Evaluation

4.2.1. Orion

4.2.2. Sherlock

4.2.3. NSDMiner

4.2.4. Sensitivity Analysis

4.2.5. MONA versus Orion

5. Conclusion

5.1. Summary

5.2. Future Work

A. Appendix

Objectives and Thematic Focus

The main objective of this work is to develop and implement a novel automated approach, called MONA (Mission Oriented Network Analysis), to identify business processes within corporate data networks by analyzing network traffic patterns, thereby overcoming the limitations of traditional process mining techniques that rely on event logs.

Automated identification of business processes in network environments.
Analysis of low-level communication patterns (header information) rather than application event logs.
Utilizing normalized cross-correlation for identifying indirect service dependencies.
Graphical representation of discovered business processes using BPMN (Business Process Modelling Notation).
Evaluation of system precision and scalability through simulated network scenarios.

Excerpt from the Book

3.1.2. Network Dependency Analysis

This section describes our approach for network dependency analysis to identify BPs by analyzing communication patterns between network services. BP identification requires identification of related tasks which are building blocks of BPs. Tasks consist of different dependent network services. Generally, there are two categories of dependencies in networks: direct and indirect dependencies. Having knowledge about both categories of dependencies leads to a better overview of tasks in networks. Hence, we are interested in both categories of dependencies and define them using our network model described in Section 3.1.1.

Direct dependencies between network services are trivial and constitute our first category of dependencies. A network service s_i^j hosted by device d_j sends data packets to another network service s_k^l hosted by network device d_k. We define this end-to-end communication as a direct dependency. Direct dependencies between network services are denoted as SDEP = {(s_i^j, s_k^l) | s_i^j sends packet to s_k^l in a considered period}. We write δ(s_i^j, s_k^l) to denote (s_i^j, s_k^l) ∈ SDEP. Data networks contain many direct dependencies. Whenever a network service of a network device requests information from a network service of another network device, there is a direct dependency between both network services.

Indirect dependencies are the second category of dependencies. Indirect dependencies are not easy to identify in network traffic of networks and more complex than direct dependencies. It is possible to estimate indirect dependencies by using SDEP+ in a brute force manner. This technique would overestimate indirect dependencies and does not lead to a deeper semantic understanding of complex dependencies in networks. We are interested in a model estimating indirect dependencies with a low rate of false positive. Therefore, we try to identify indirect dependencies between network services by identifying similar patterns in communication vectors of direct dependencies.

Summary of Chapters

1. Introduction: Presents the motivation for automated business process management and outlines the limitations of existing log-based mining techniques.

2. Motivation: Discusses background, related research in network dependency discovery, challenges in the field, and defines core requirements for the proposed approach.

3. Network Service Dependency Discovery: Details the technical framework, network modeling, identification of dependencies, implementation of core modules, and the representation of tasks using BPMN.

4. Evaluation: Explains the validation of the model through simulations and provides a comparative analysis against existing tools like Orion, Sherlock, and NSDMiner.

5. Conclusion: Summarizes the key contributions of the work and proposes potential future developments, particularly regarding Intrusion Detection Systems.

Keywords

Business Process Management, Process Mining, Network Service Dependency, MONA, Data Networks, Communication Patterns, Normalized Cross Correlation, BPMN, Indirect Dependency, Traffic Analysis, Network Modeling, Subnetwork Identification, Scalability, Automated Discovery, Task Dependency.

Frequently Asked Questions

What is the core purpose of this Master's thesis?

The thesis aims to facilitate the identification of business processes in large companies by analyzing network traffic instead of relying on application-level event logs, which are often unavailable or insufficient.

What are the primary thematic fields covered?

The work integrates knowledge from IT, management sciences, and network engineering, focusing on network service dependency discovery and process mining.

What is the main research question or goal?

The primary goal is to show that automated identification of business processes is tractable using polynomial-time algorithms that analyze communication patterns in network traffic.

Which scientific methodology is employed?

The author designs a model called MONA (Mission Oriented Network Analysis) that utilizes normalized cross-correlation to identify direct and indirect dependencies between network services based on communication patterns.

What are the key contents of the main part?

The main part covers the formal definition of network models and tasks, the technical implementation of four modules (flow extractor, subnet identifier, dependency generator, and calculator), and a comprehensive evaluation of the model against other approaches.

Which keywords best characterize the research?

Key terms include Business Process Management, Process Mining, Network Dependency Discovery, MONA, Traffic Analysis, and Normalized Cross Correlation.

How does the author handle potential false positives in dependency discovery?

The author uses an "Indirect Dependency Calculator" module that employs normalized cross-correlation against a robust threshold (0.85) to filter out incorrectly identified indirect dependencies.

Why does the author prioritize network traffic over application logs?

Many modern applications lack standardized log files, or log files are too technical/cumbersome for business analysts to interpret; network traffic, however, is universally available and contains actionable communication patterns.

How does MONA visualize the discovered business processes?

The system maps identified dependencies to the Business Process Modelling Notation (BPMN) standard, providing a high-level, intuitive view for business analysts.

Ende der Leseprobe aus 63 Seiten - nach oben

Details

Titel: Using Subsequence Mining to Identify Business Processes in Data Networks
Hochschule: Technische Universität Hamburg-Harburg (TUHH; Universität zu Lübeck)
Autor: Felix Kuhr (Autor:in)
Erscheinungsjahr: 2016
Seiten: 63
Katalognummer: V351074
ISBN (eBook): 9783668379640
Dateigröße: 744 KB
Sprache: Englisch
Schlagworte: using subsequence mining identify business processes data networks
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 34,99

Arbeit zitieren: Felix Kuhr (Autor:in), 2016, Using Subsequence Mining to Identify Business Processes in Data Networks, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/351074