General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels

Masterarbeit, 2017
73 Seiten, Note: 4.6/5

Informatik - Angewandte Informatik

Leseprobe

Inhaltsverzeichnis (Table of Contents)

Chapter 1: Introduction

Motivation: Why is the Topic so Important?
Thesis Overview
The Problem and Contribution
Outline of The Thesis

Chapter 2: Background

Dialogue Systems

Introduction
Data-Driven vs Other Design Approaches

McGill Ubuntu Dialogue Corpus

Chapter 3: Methods and Techniques

Natural Language Processing (NLP)

Introduction
Wikipedia-Based Explicit Semantic Analysis (ESA)

Deep Learning

Why Deep Learning?
Deep Neural Networks: Definitions and Basics
RNN and LSTM Networks

Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)

This thesis aims to develop a general pipeline architecture for extracting one-on-one dialogues from various IRC channels, building upon existing work using the Ubuntu IRC channel. The thesis also explores the application of Wikipedia-Based Explicit Semantic Analysis (ESA) on the extracted dialogues. This work contributes to the advancement of data-driven dialogue systems, particularly in the area of best response selection.

Development of a general pipeline architecture for extracting one-on-one dialogues from multiple IRC channels.
Application of Wikipedia-Based Explicit Semantic Analysis (ESA) on extracted dialogues.
Evaluation of ESA's effectiveness in improving dialogue interpretation.
Exploration of the potential of deep learning methods in dialogue systems.
Contribution to the advancement of data-driven dialogue systems.

Zusammenfassung der Kapitel (Chapter Summaries)

Chapter 1: Introduction outlines the importance of the topic, provides an overview of the thesis, and details the problem addressed and the contribution made. It also presents the structure of the thesis.

Chapter 2: Background discusses the concept of Dialogue Systems, with a particular focus on Data-Driven approaches. It introduces the McGill Ubuntu Dialogue Corpus, a significant dataset used for training dialogue systems.

Chapter 3: Methods and Techniques delves into Natural Language Processing (NLP), focusing on Wikipedia-Based Explicit Semantic Analysis (ESA) as a technique for improving dialogue interpretation. The chapter also explores Deep Learning, emphasizing its potential in the field of dialogue systems and introducing the concepts of Deep Neural Networks, RNNs, and LSTMs.

Schlüsselwörter (Keywords)

The central focus of this thesis lies on Dialogue Systems, Data-Driven Approaches, IRC Channel Dialogue Extraction, Natural Language Processing (NLP), Wikipedia-Based Explicit Semantic Analysis (ESA), Deep Learning, and best response selection in unstructured dialogue systems.

Frequently Asked Questions

What is the goal of the proposed pipeline architecture?

The goal is to automate the extraction of one-on-one dialogues from various IRC channels to create structured data for training AI models.

What is Wikipedia-Based Explicit Semantic Analysis (ESA)?

ESA is a technique that uses Wikipedia's knowledge base to improve the interpretation of language, specifically addressing problems like polysemy and synonymy.

Why is deep learning used in dialogue systems?

Deep learning, particularly RNNs and LSTMs, allows systems to learn complex patterns in conversational data and perform tasks like selecting the best response.

What is the Ubuntu Dialogue Corpus?

It is a large dataset of technical support conversations from the Ubuntu IRC channel, widely used in the research of data-driven dialogue systems.

How does the computer handle unstructured IRC data?

The pipeline post-processes the raw, unstructured chat logs to identify dialogue turns and participants, transforming them into a structured format for machine manipulation.

Ende der Leseprobe aus 73 Seiten - nach oben

Details

Titel: General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels
Hochschule: Eötvös Loránd Universität
Veranstaltung: Master's Degree in Computer Science
Note: 4.6/5
Autor: Ahmed Abouzeid (Autor:in)
Erscheinungsjahr: 2017
Seiten: 73
Katalognummer: V365283
ISBN (eBook): 9783668468634
ISBN (Buch): 9783668468641
Dateigröße: 2192 KB
Sprache: Englisch
Schlagworte: IRC Pipeline Architecture computer ESA Ubuntu AI
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 34,99
Preis (Book): US$ 46,99

Arbeit zitieren: Ahmed Abouzeid (Autor:in), 2017, General Pipeline Architecture for Domain-Specific Dialogue Extraction from different IRC Channels, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/365283