Masterarbeit, 2017
73 Seiten, Note: 4.6/5
This thesis aims to develop a general pipeline architecture for extracting one-on-one dialogues from various IRC channels, building upon existing work using the Ubuntu IRC channel. The thesis also explores the application of Wikipedia-Based Explicit Semantic Analysis (ESA) on the extracted dialogues. This work contributes to the advancement of data-driven dialogue systems, particularly in the area of best response selection.
Chapter 1: Introduction outlines the importance of the topic, provides an overview of the thesis, and details the problem addressed and the contribution made. It also presents the structure of the thesis.
Chapter 2: Background discusses the concept of Dialogue Systems, with a particular focus on Data-Driven approaches. It introduces the McGill Ubuntu Dialogue Corpus, a significant dataset used for training dialogue systems.
Chapter 3: Methods and Techniques delves into Natural Language Processing (NLP), focusing on Wikipedia-Based Explicit Semantic Analysis (ESA) as a technique for improving dialogue interpretation. The chapter also explores Deep Learning, emphasizing its potential in the field of dialogue systems and introducing the concepts of Deep Neural Networks, RNNs, and LSTMs.
The central focus of this thesis lies on Dialogue Systems, Data-Driven Approaches, IRC Channel Dialogue Extraction, Natural Language Processing (NLP), Wikipedia-Based Explicit Semantic Analysis (ESA), Deep Learning, and best response selection in unstructured dialogue systems.
The goal is to automate the extraction of one-on-one dialogues from various IRC channels to create structured data for training AI models.
ESA is a technique that uses Wikipedia's knowledge base to improve the interpretation of language, specifically addressing problems like polysemy and synonymy.
Deep learning, particularly RNNs and LSTMs, allows systems to learn complex patterns in conversational data and perform tasks like selecting the best response.
It is a large dataset of technical support conversations from the Ubuntu IRC channel, widely used in the research of data-driven dialogue systems.
The pipeline post-processes the raw, unstructured chat logs to identify dialogue turns and participants, transforming them into a structured format for machine manipulation.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

