Für neue Autoren:
kostenlos, einfach und schnell
Für bereits registrierte Autoren
73 Seiten, Note: 4.6/5
List of Figures
List of Tables
Chapter 1: Introduction
1.1 Motivation: Why is the Topic so Important?
1.2 Thesis Overview
1.3 The Problem and Contribution
1.4 Outline of The Thesis
Chapter 2: Background
2.1 Dialogue Systems
2.1.2 Data-Driven vs Other Design Approaches
2.2 McGill Ubuntu Dialogue Corpus
Chapter 3: Methods and Techniques
3.1 Natural Language Processing (NLP)
3.1.2 Wikipedia-Based Explicit Semantic Analysis (ESA)
3.2 Deep Learning
3.2.1 Why Deep Learning?
3.2.2 Deep Neural Networks: Definitions and Basics
3.2.3 RNN and LSTM Networks
Chapter 4: Data Collection: Six IRC Channels
Chapter 5: General Pipeline for Dialogue Extraction: IRC-VPP
5.1 Pipeline Architecture
5.2 Components and Configurations
5.2.1 IRC Channel Crawler
5.2.2 Raw IRC Cleaner
5.2.3 Dialogue Extraction
5.3 Post-Processing Algorithms
5.3.1 Message Extraction
5.3.2 Recipient Identification
5.3.3 Dialogue Extraction and Hole-Filling
5.3.4 Relevant Messages Concatenation
5.4 Annotating IRC-VPP Dialogues Datasets
Chapter 6: Experiments and Evaluation
6.1 Pre-Training Datasets Statistics
6.2 IRC-VPP Software vs McGill Software
6.3 RNN/LSTM/ESA Results
Chapter 7: Conclusion and Future Work
Appendix A: Internet Relay Chat (IRC)
Appendix B: The Learning Process in Artificial Neural Networks
The power of an intelligent program to perform its task well depends primarily on the quantity and quality of knowledge it has about that task. Advanced techniques and applications in Artificial Intelligence are highly depending on data which at the same time getting highly increased and are available over the web. However, for a computer to be able to manipulate information, the latter should be in a form that makes it easy for a computer to manipulate. That is, many available unstructured data need to be collected and post-processed in order to create structured information from the unstructured ones. Recent advances in Data-Driven Dialogue Systems made use of the Ubuntu published IRC channel conversations to extract one-on-one dialogues to use in Deep Learning methods. A best response task performed by a Dialogue System can make use of a trained model on such dialogues. In addition, techniques in Natural Language Processing like Semantic Analysis had a remarkable progress, Wikipedia-Based Explicit Semantic Analysis (ESA) is an example, where the problem of interpretation was improved for both Polysemy and Synonymy.
The thesis proposes a general pipeline architecture for one-on-one dialogues extraction from many different IRC channels to extend the state of art work for the Ubuntu IRC channel. Further more, the thesis takes the advantage of the results from the pipeline and evaluates ESA on different extracted dialogues.
To my parents, for the endless love, and support they gave me over the years. And to my best friend Shimaa, who lived for the others and for her country, may she rest in peace.
I would like to express my gratitude to my supervisors, Professor András Lõrincz, who gave me the opportunity to participate in that research direction and supported me over one year with advice and wisdom. And Balázs Pintér, who guided and taught me many things and was always there for my questions and concerns. Without both of you, I could not finish this work as it is. Thank you for all your dedicated time, continues advice, and patience. I would like also to thank all the group team members for the welcome, coordination, and the moments we worked together.
2.1 Spoken Dialogue System Pipeline Architecture
2.2 Part of a Finite State-Based Dialogue System State Diagram
2.3 Slot Filling in Frame-Based Dialogue Systems
3.1 Process of Interpretation in Wikipedia-Based ESA
3.2 Example of a Simple Neural Network Architecture
3.3 Example of a Deep Neural Network Architecture
3.4 Recurrent Neural Network Handling Sequences In Time
4.1 Ubuntu IRC Logging Website HTML Structure
4.2 Lisp IRC Logging Website HTML Structure
4.3 Perl6 IRC Logging Website HTML Structure
4.4 Perl6 IRC Log Format and Text Structure
4.5 Koha IRC Logging Website HTML Structure
4.6 Koha IRC Log Format and Text Structure
4.7 ScummVM IRC Logging Website HTML Structure
4.8 MediaWiki IRC Logging Website HTML Structure
5.1 IRC-VPP Software Pipeline Architecture
5.2 IRC-VPP Software UML Component Diagram
2.1 Example of Ubuntu IRC Channel Chat Room Conversation
4.1 Ubuntu IRC Log Format and Text Structure
4.2 Lisp IRC Log Format and Text Structure
4.3 ScummVM IRC Log Format and Text Structure
4.4 MediaWiki IRC Log Format and Text Structure
5.1 IRC Channel Crawler Component Configuration Parameters
5.2 Raw IRC Cleaner Component Configuration Parameters
5.3 Dialogue Extraction Component Configuration Parameters
5.4 Example of Final Training Samples
6.1 IRC Channels Generated Corpora Statistical Comparison
6.2 IRC-VPP vs McGill
6.3 IRC-VPP vs McGill (For Only Matching Dialogues)
6.4 Best Response Accuracy Preliminary Results (1 in 10 Recall@1)
6.5 Best Response Accuracy Preliminary Results (1 in 10 Recall@2)
6.6 Best Response Accuracy Preliminary Results (1 in 10 Recall@5)
The power of an intelligent program to perform its task well depends primarily on the quantity and quality of knowledge it has about that task. Deep Learning methods, and advances in Natural Language Processing have inspired researchers in Dialogue Systems to build more natural and intelligent systems. However, these methods are highly depending on data which need to be easily obtained, post-processed, and evalu- ated. That phase could be a challenge since it requires additional time and work before applying the chosen methods. The full details about these methods are not tackled here. Instead, a general pipeline architecture was designed and implemented in the studies to generalize and automate the process of data collection, post-processing, and evaluation in order to maximize the benefits from different domain-specific available human-human conversations.
Deep Learning is a branch of Machine Learning and has become one of the most effective techniques in solving computable problems. It has proved accuracy when being applied in many tasks, a difficult Natural Language Processing (NLP) problem could be solved by involving Deep Learning which has shown better results than other techniques. The work in this thesis focuses on a Natural Language Processing (NLP) problem in Data-Driven Dialogue Systems where state of art Deep Learning and NLP methods are applied. The study focuses on the automation of topic-based data collection and a general pipeline architecture is introduced to extract human- human conversational data from different resources with a single software.
The Internet Relay Chat (Appendix A) channels exist on the web have a strong source of time series human-human conversations, these data can be used to feed a Deep Neural Network used in Data-Driven and Unstructured Multi-Turn Dialogue Systems.
The IRC channel is domain-specific where the conversations are limited to a cer- tain topic, and mostly technical problems that less expert users ask about and get responses from more expert users. That makes IRC an interesting source for goal oriented situations. There are many available IRC chat logs, but it is still hard to download and process these data in a convenient time and with less man power. The first problem with IRC conversations is that they are not ready for being manipulated by a computer and used in a Neural Network yet, because the textual data need to be cleaned and transformed and structured dialogues should be extracted from the whole unstructured flow of conversations. Then, it becomes possible to generate a proper training set to feed the Deep Neural Network. Chapter 3 demonstrates the definitions and basics of Deep Neural Networks. A typical Neural Network-Based Language Model acquires a large amount of training examples between (10 − 10 ) which is not all-time facility for researchers. Some IRC channels for certain domains can have less data than required, and that brings a another problem.
In this thesis, data from different IRC channels were automatically crawled and new post-processing algorithms were designed to clean, transform, and extract oneon-one dialogues as pre-training sets that could be used with the Deep Neural Network to solve the best response task performed by a Dialogue System. Further more, the thesis applies a comparison on two approaches used for the preliminary results. The first approach is applying only Deep Learning methods on the extracted dialogues, while the second approach is to combine Wikipedia-Based Explicit Semantic Analysis (ESA) with Deep Learning methods.
This thesis tries to answer the question: how can we maximize the benefits from the available domain-specific online data? For that purpose, it introduces a new versatile IRC channel post-processing software which will be referred to as IRC-VPP. The IRC-VPP software can adapt with different data formats and it produces a conventional output format which makes it possible to integrate with other software like the training model which will be used to make use of the data. For that purpose, new post-processing algorithms were designed to allow different heuristics for different IRC channels data. The heuristics are determined through run time arguments and that dictates how the data post-processing will be performed. Following that, the thesis proposes a solution for the data collection and data transformation problem when different data patterns usually need different data collection techniques. In addition, because it became possible to generate different domains data using the proposed pipeline, the thesis uses a method developed by my supervisor Balázs Pintér who joined Wikipedia-Based ESA and the Neural Network techniques to improve the results of the Neural Network and evaluate that on different data.
The thesis is outlined as follows: In Chapter 2, an overview is given on some research topics related to this thesis. The chapter covers an introduction on Dialogue Systems and different design approaches. In addition, an overview is given on the state of art work on processing IRC channels for Unstructured Multi-Turn Dialogue Systems by summarizing the work done by McGill University on the Ubuntu IRC channel.
Chapter 3 illustrates the methods used for the preliminary results. The chapter gives an introduction to Natural Language Processing and Wikipedia-Based Explicit Semantic Analysis (ESA). Also, a background on Deep Learning is given and the Neural Network Models used in the studies are demonstrated.
Chapter 4 demonstrates the data collection phase and the different IRC channels collected with a short explanation of each channel logging website HTML structure, domain, and channel data formats.
In Chapter 5, a detailed explanation is given on the IRC-VPP software developed for this thesis as a general pipeline architecture for one-on-one dialogues extraction, the chapter starts with the system pipeline architecture. Then, the components are explained in details and the IRC post-processing algorithms that generate the pre- training datasets are explained. In addition, differences from McGill algorithms are mentioned and a declaration is given on how IRC-VPP software works with multiple IRC channels compared to McGill software. Finally, the chapter explains how the results are integrated with another software to prepare the dialogues for training.
Chapter 6 evaluates all preliminary results. First, the six IRC channels are compared to each others. Second, the proposed IRC-VPP software is evaluated by comparing Ubuntu results from both IRC-VPP and McGill software. Finally, an evaluation of two Neural Network models with the collected IRC data is applied and compared with combining ESA with each model.
Finally, chapter 7 gives a conclusion of the work and what could the next step.
This chapter gives a background to Dialogue Systems which is in relevance to this thesis. Section 2.1, introduces Dialogue Systems and shows how they have been evolved over the time and compares between different approaches in the design. This section also shows the drawbacks in some of these approaches and what makes a data-driven approach gives a better user experience. By the end of this chapter, in section 2.2, a demonstration is given on the state of art work on Data-Driven and Multi-Turn Unstructured Dialogue Systems and how this thesis can go further from that.
Dialogue Systems or sometimes called conversational systems are one of the direc- tions in Human-Computer Interaction (HCI) research. Human-Computer Interaction is sometimes called Human-Machine Interaction or Interfacing. A system, a ma- chine, or a computer, refers to the same concept in this context as well. HCI was automatically presented by the emerging of computers or even previous machines. The reason for that is because any device or machine cannot be really useful unless its users can interact with it in a way that assures meeting the expectations. Hence, the progress made in HCI over the years is remarkable     in order to meet better user experience.
A human interacting with a machine has three different levels of activities that can be noticed in the process of interaction: physical, cognitive, and affective activities. The physical activity is all the mechanics of interaction between human and machine. On the other hand, a cognitive activity is how the user can understand the usage of the machine to be able to interact with it properly. The affective activity is a latest aspect in HCI, it focuses on how to make the user experience pleasurable in a way that affects the user to continue use the machine.
Intelligent HCI is one of the latest advances in research, it applies all levels of user activity: physical, cognitive, and affective. Intelligent HCI is a design that incorporates some sort of intelligence when interacting with a user, a dialogue system like a chat bot which uses natural language to interact and respond to a user in a human-like attitude is an example of such intelligent HCI.
Dialogue Systems are interfaces that allow communication between humans and machines in more reliable and natural way, in order to achieve certain tasks or it could be for entertainment and no goal specified. Dialogue Systems can rely on textual con- versation, or both text and speech, or even can be more simple and invoke only voice commands. One of the early developed spoken Dialogue Systems was the in-car voice control system. Another early application was the Interactive Voice Response (IVR) used in customer support centers to handle the increase amount of calls that human resources may not be able to handle. IVR systems use voices and Dual-Tune Multiple Frequency (DTMF) as an interface for users. Such interfaces still do not answer the question if it is possible to make a conversation with a machine as a reality and not only an imagination. Typical architecture of Modern Dialogue Systems has five main components that perform five tasks: (1) Input Decoder, (2) Natural Language Under- standing (NLU), (3) Dialogue Management (DM), (4) Natural Language Generation (NLG), and (5) Output Renderer. One of the recent advances in that direction is the Spoken Dialogue Systems, In case of a Spoken Dialogue System, a conversation is made by speech, either from the user or the machine side or both of them in turns.
In a Spoken Dialogue System, Automatic Speech Recognition and Speech Synthe- sis or Text-To-Speech Synthesis (TTS) technologies could be the Input Decoder and the Output Renderer respectively. In Spoken Dialogue Systems, Natural Language Understanding (NLU) component sometimes called Spoken Language Understanding (SLU). As figure 2.1 shows, a Spoken Dialogue System components are connected in a pipeline architecture. In such systems, the work-flow is as follows: first, user speaks some utterances and the Automatic Speech Recognition analyzes that, then converts it to text. The Natural Language Understanding (NLU) component starts the semantic analysis to infer what the user intended to say. Then, the Dialogue Manager (DM) determines what action should be made by the system, that could be making a dialogue turn instead of keep listening, maintaining the conversation history, adopting different dialogue strategies, deciding the best response to the user, or retrieving information from a back-end component. At a certain point when the Dialogue Manager (DM) applies a response action, the Natural Language Generation (NLU) component produces the sentences to be sent as text to the Text-To-Speech (TTS) component which by turn converts the sentences to audio signals. A Dialogue System generally relies on a knowledge base as a back-end component in order to perform its functions. For instance, a Dialogue System that provides flight booking services, should have a database containing all flights updated information in order to be able to provide answers to users queries during the dialogue. Another example of a knowledge base that could be used is a rule-based knowledge.
illustration not visible in this excerpt
Figure 2.1: Spoken Dialogue System Pipeline Architecture
The arrival of speech recognition technologies and other advances in the field of Artificial Intelligence showed the significant progress in the methods of interaction between humans and machines in a human-like manner. An example of such progress are the latest advanced Spoken Dialogue Systems like Apple’s Siri, Google Now, and Microsoft Cortana, these technologies use natural language to interface with users and shifted the level to be more human-like. According to the significant evolution in Dialogue Systems techniques with the time, distinguishing a system from another has become more complicated. One way to differentiate is how the Dialogue Manager is modeled. A Dialogue Manager is a crucial part of any advanced Dialogue System, it manages the interaction between the user and the machine and determines what response should the machine provides and when to make a turn and many other tasks. A convenient classification of Dialogue Systems is to classify them as: Finite State- Based, Frame-Based, and Agent-Based model. In more recent research, modeling is done using data-driven methods rather than these methods used in the Frame-Based and Finite State-Based which tended to be hand-crafted methods and inflexible. According to that, we can do another classification based on whether a Dialogue Manager model is following a hand-crafted approach or a data-driven approach where the Dialogue Manager could rely on a neural language model which is trained using some data. Next section gives an overview on different approaches in the Dialogue Systems design and what makes one approach better than another.
One of the simplest implementations of Dialogue Systems is the Finite State-Based design and sometimes called Graph-Based Systems. Such design is more easy to im- plement but provides a less natural way of communication. It is based on a Finite State Machine model and the system handles the conversation by designing a pre- determined steps or stages which are the states in the model design. A sequence of states with transitions between them are made according to a predefined state transi- tion. In figure 2.2, an example of a part of a Finite State Machine transition diagram that models a travel agency spoken dialogue system. In that model, the user answers are grammatically predefined and there is no flexibility in the interaction process. For instance, the system will not be able to provide the service if the user answered something that was not hand-crafted already and was not expected at a certain step of the conversation. So, if the system asks for a destination while the user replies with both destination and date of departure, then the system could be confused and asks again since it is strictly expecting a destination only, and only after it gets it, it can proceed with requiring more information.
illustration not visible in this excerpt
Figure 2.2: Part of a Finite State-Based Dialogue System State Diagram
On the other hand, Frame-Based Systems are more advanced than Finite State- Based Systems. However, according to the fixed and structured dialogue mechanism of Frame-Based Systems, they still have limitations. A Structured Dialogue System means that it has a fixed behavior like always providing expected information or requests during the conversation. In Frame-Based Dialogue Systems, different slots are created to be filled at run-time. The slots act like a template that will be used to guide the system what was requested and how it should proceed. Due to the slots concept, Frame-Based Systems do not strictly predetermine what answers should be given by the user at a certain state like in Finite State-Based, instead, the system asks questions to the user, filling any slots that user specifies. So, if user answers many questions at once, the system will fill the slots and not ask these questions again. That avoids strict constraints on order of the Finite State-Based architecture.
Following that, the conversation goes in more natural way. Figure 2.3, indicates how a Frame-Based travel agency dialogue system is filling its slots according to user inputs.
illustration not visible in this excerpt
Figure 2.3: Slot Filling in Frame-Based Dialogue Systems
Agent-Based Systems are Artificial Intelligence-Based designs where data can play a very important role. Such designs allow complex communication with the user in order to solve some problems or perform a task. The interaction is viewed as an interaction between two agents, each of which is capable of reasoning about its own actions and beliefs. The dialogue model takes the preceding context into account. The dialogue could evolve dynamically as a sequence of related steps that build on top of each other. State of art Agent-Based Dialogue Systems use an artificial neural network and train it on human-human conversational data instead of following pure hard-coded instructions. That makes the model acts more naturally. Finite State-Based and Frame-Based Systems cannot handle complex situations and are less natural than such Agent-Based Systems.
In the data-driven approach, instead of hand-craft the system instructions and use fixed rules like If-Then statements to control the flow of the conversation whether it was a Finite-Based or Frame-Based, different neural networks could be trained or other Machine Learning techniques could be applied for different purposes and in different research focus areas. In general, data-driven models depend on annotated data to learn from real conversations without relying on pure fixed rules which are explicitly defined. Instead, the model is trained for a specific task by using real conversations to get insights from and learn how to reply or make a turn during the conversation. For example, a turn-taking  which is a very important problem to be considered since a system that makes unexpected turns or do not make proper turns during a conversation cannot be considered robust or user-friendly. Another example is the best response that a dialogue manager should give, that issue was introduced with a promising solution in the McGill University work on the Ubuntu IRC channel generated corpus and the Deep Learning methods  they used to address that issue in a Data-Driven Multi-Turn Dialogue System. In McGill work, the Dialogue Manager relied on a neural language model to give the best response to a context by creating training samples from real human-human conversations about Ubuntu technical problems and train the model to give the best response to a set of utterances by learning how to differentiate between a wrong and a good response. That shows how data-driven approach extends the ability of hand-crafted dialogue systems and instead of building a system that could only handle a conversation about a limited inputs from users, it could handle different situations about a specific domain as much as it was trained on that domain data. The work applied in this thesis is extending McGill University work to almost generalize the Deep Learning techniques they used on as many IRC channels of different domains.
The Ubuntu Dialogue Corpus which was created and published by McGill Univer- sity is a dataset that contains more than 1 million multi-turn dialogues which have more than 7 million utterances and 100 million words. Such number of dia- logues and textual data is considered as a unique source for Unstructured Multi-Turn Data-Driven Dialogue Systems Research. A neural network-based Data Manager component can make use of such amount of data for its neural language model. The dataset was generated by downloading the Ubuntu IRC channel logs and applying some post-processing before using it. The dialogues have multi-turns and their un- structured nature makes them different from other structured dialogues that could be available like State Tracking Challenge (STCD) datasets used in structured dialogue systems. The Ubuntu IRC channel is a place where many people are chatting about Ubuntu-related technical problems, that makes it suitable for a goal-oriented Dialogue System dedicated for technical issues in Ubuntu. For example, it could be used to train a neural language model for how to respond to Ubuntu technical questions.
The post-processing phase is a critical part in similar work. In general, a domainspecific and a clean dataset is needed for (goal oriented) Data-Driven Dialogue Managers that depend on a neural language model in the design. A set of post-processing algorithms were created by McGill to extract the dialogues from the Ubuntu IRC channel. This thesis introduces similar algorithms with some improvements that will be discussed and evaluated in later chapters.
In IRC channels logs, different information can be extracted: (1) the date and time when a message is sent, (2) the sender, (3) the message utterance itself, and (4) the message recipient. A recipient of a message could be empty in case a user is just asking a general question hoping that someone will reply and help. In other cases, and most probably when a conversation is established between two persons, the recipient can be noticed according to the IRC channel regulations. For example, a user should mention his/her recipient as the first word in the message utterance followed by a comma or a colon before start typing the message utterance. Such heuristics are used in the post-processing algorithms to extract all the required information to construct one-on-one dialogues from the channel. Figure 2.1 gives an example of the chat room conversation from the Ubuntu IRC channel.
illustration not visible in this excerpt
Table 2.1: Example of Ubuntu IRC Channel Chat Room Conversation
The Ubuntu Corpus is unlabeled dataset which acts as a pre-training dataset that requires annotating the extracted conversations before applying the Deep Learning methods. The focus in the project was to train a neural language model for how to se- lect the best response in a conversation about Ubuntu technical problems. Hence, the annotating process was to create training samples where each sample is a 3 elements tuple: (1) a set of utterances act as a context, (2) a response utterance, and (3) a flag classifies if the response is a good candidate response or not. The good response labels are assigned for the original extracted dialogue between two persons. On the other hand, other randomly generated wrong responses were generated by mixing portions of irrelevant dialogues and labeled as wrong for the purpose of training to make the system able to distinguish between the correct and the wrong responses. Two soft- ware were applied in McGill work, the first is the post-processing software which this thesis shows an analogy between it and the IRC-VPP software, and the second is the annotating and training software which IRC-VPP is interfacing with through its generated pre-training dataset that contains the extracted one-on-one dialogues without the labels. In McGill work, one traditional classifier (TF-IDF) and Two learning architectures were applied, the Recurrent Neural Network (RNN) and the Long Short Term Memory Network (LSTM). In chapter 4, an introduction is given to Neural Networks with an overview on the two learning architectures. Chapter 5 describes McGill post-processing algorithms and a comparison with the algorithms designed in IRC-VPP software in order to generalize the work done by McGill Uni- versity, so it becomes possible to extract different IRC channels dialogues and not only Ubuntu.
 The use of some conventions to describe a class of things following a set of rules or conditions like If-Then
 Mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states
 Short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus
Wissenschaftliche Studie, 34 Seiten
Masterarbeit, 58 Seiten
Bachelorarbeit, 70 Seiten
Doktorarbeit / Dissertation, 92 Seiten
Bachelorarbeit, 28 Seiten
Doktorarbeit / Dissertation, 199 Seiten
Bachelorarbeit, 18 Seiten
Diplomarbeit, 161 Seiten
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!