Masterarbeit, 2017
115 Seiten, Note: 75
1. INTRODUCTION AND BACKGROUND
1.0 Overview
1.1. Automatic Text Summarization(ATS)
1.2. Motivation
1.3. The Problem Statement and Justification of the Study
1.4. Research Questions
1.5. Objectives of the Study
1.4.1. Specific Objectives
1.6. Significance of the Study
1.7. Research Methodology
1.8. Literature Review
1.9. Data Source Collection and Preperation
1.9.1. Corpus Preparation
1.9.2. Manual Summary Preparation
1.10. Summarization Method and Tools used in this Study
1.10.1. Development Tools
1.10.2. The Natural Language Toolkit (NLTK)
1.10.4. Installing the NLTK data
1.10.8. Operating System
1.10.9. The Python Programming Language
1.10.10. The Numpy Library
1.10.11. Charm Integrated Development Environment (IDE)
1.12. Scope and Limitations of the Study
1.13. Outline of the Dissertation
CHAPTER TWO
1.1. LITERATURE REVIEW
2.0 Introduction
2.1. Automatic Text Summarization
2.2. Processes of Automatic Text Summarization
2.2.1. Summarization Parameters
2.2.2. Methods of Summarization
2.3. Linguistic Concepts to Consider
2.3.1. Coherence
2.3.2. Cohesion
2.3.3. Lexical Cohesion
2.4. News Writing Structure
2.5. Evaluation Methods used in Automatic Summarization
CHAPTER THREE
THE XHOSA LANGUAGE
3.0 Introduction
3.1. Xhosa Consonants and Vowels
3.1.1. The Vowel System
3.1.2. Consonants
3.2. Overview of Xhosa Orthography
3.3. Xhosa Morpheme Types
3.3.1. Xhosa Nouns
3.3.2. Xhosa Prefixes
3.3.3. The Xhosa Noun Stems
3.3.4. Xhosa Suffixes
3.3.5. Pronouns
3.3.6. Verbs
3.3.7. Adjectives
3.3.8. Apostrophe
3.4. Abbreviation
3.5 Summary
CHAPTER FOUR
METHODOLGY AND SYSTEM DESIGN
4.0 Introduction
4.1. Methodology
4.2. Proposed Algorithm
4.4.1. How the Algorithm Works
4.3. Preprocessing
4.3.1. Tokenization
4.3.2. Stop Words
4.3.3. Stemming
4.6 Sentence Ranking
4.7 Summary Generation
4.8 System Design
4.10 Summary
CHAPTER FIVE
IMPLEMENTATION
5.0 Introduction
5.1. Tokenization
5.2. Stop Word Removal
5.3. Stemming
5.4. Implementation
5.4.1. The IsiXhoSum Interface
5.4.2. Modules of the Xhosa Text Summarizer
5.5. Experimentation
5.5.1. Corpus Preparation
5.5.2. Creation of Manual Summaries
5.6. Summary
CHAPTER SIX
TESTING, RESULTS, AND DISCUSSION
6.0 Introduction
6.1. Testing
6.2. Results
6.2.1. Results of Subjective Evaluation
6.2.2. Results of Objective Evaluation
6.3. Discussion of the Results
6.4. Discsion on Coherence and Cohesion
6.5. Summary
CHAPTER SEVEN
5 CONCLUSION AND FUTURE WORK
7.0 Introduction
7.1. Research Summary
7.2. Conclusion and Future Work
The primary aim of this dissertation is to design, implement, and evaluate an automated text summarizer specifically for isiXhosa news articles. This research addresses the growing challenge of information overload for isiXhosa speakers by developing an extraction-based system that identifies and presents the most relevant content from lengthy news articles. The system is designed to function with minimal reliance on complex semantic resources, utilizing statistical techniques adapted for the specific linguistic requirements of the Xhosa language.
1.1. Automatic Text Summarization(ATS)
The volume of information available for users of the Internet has been increasing on a daily basis. In this, the information age, the growth of electronic information has necessitated intensive research in the area of Natural Language Processing (NLP) and Information Retrieval (IR). The fast growth of information has made it difficult for many users to cope with all the text that potentially is of interest to them. As a result, systems that can automatically summarize one or more documents, have become the focus of interest recently, in the field of automatic summarization [1]. Automatic text summarization has become a suitable tool for assisting people in the task of reading large volumes of textual information.
Examples of summaries that users choose are: news headlines, scientific abstracts, minutes of meetings, and weather forecasts. These are all kinds of summaries people enjoy reading on a daily basis [2].
A summary can help users to get the meaning of a complete text document within a short time. The following are some of the general reasons that support the necessity of text summarization.
1. INTRODUCTION AND BACKGROUND: This chapter provides the research context, problem statement, and objectives, emphasizing the need for an automatic isiXhosa news summarizer.
1.1. LITERATURE REVIEW: This chapter covers existing research on automatic text summarization, including core processes, techniques, and linguistic concepts relevant to text analysis.
CHAPTER THREE: This section details the isiXhosa language, discussing its unique morphology, consonant and vowel inventory, and its orthographic history.
CHAPTER FOUR: This chapter describes the methodology and system architecture, focusing on the preprocessing steps and the proposed ranking algorithms used for extraction.
CHAPTER FIVE: This chapter details the technical implementation, explaining the interface development, stemmer rules, and the preparation of the corpus and manual summaries.
CHAPTER SIX: This chapter presents the testing phase, evaluating both the subjective and objective results of the IsiXhoSum system against manual summaries.
CHAPTER SEVEN: This chapter concludes the research by summarizing the findings and suggesting potential directions for future enhancements.
Xhosa, Automatic Text Summarization, Natural Language Processing, IsiXhoSum, Term Frequency, Sentence Position, Extraction-based, Information Retrieval, Linguistic Analysis, Stemming, Corpus, Subjective Evaluation, Objective Evaluation, ROUGE, News Articles.
The research focuses on the development of an automated extraction-based text summarizer designed specifically for isiXhosa language news articles.
The key themes include the linguistic analysis of isiXhosa, statistical approaches to sentence ranking, the implementation of language-specific preprocessing tools (such as a stemmer), and the evaluation of system-generated summaries.
The objective is to design, implement, and evaluate a prototype system, named IsiXhoSum, capable of producing readable summaries of Xhosa news texts that help reduce information overload for users.
The study utilized extraction-based summarization methods, specifically incorporating term frequency analysis and sentence position algorithms, supported by linguistic rules tailored for the isiXhosa language.
The work covers a review of existing summarization literature, an analysis of Xhosa language structure, the design of the system's algorithm, the technical implementation using Python and NLTK, and an extensive evaluation phase.
Key terms include Xhosa, Automatic Text Summarization, Natural Language Processing, IsiXhoSum, Term Frequency, and Sentence Position.
The system uses a lightweight rule-based stemmer specifically developed to strip suffixes and prefixes from Xhosa nouns and verbs, ensuring that words with the same stem are mapped to a single form.
Evaluation was conducted using both subjective methods (rating by isiXhosa native speakers) and objective methods (comparing system output to human-generated summaries using the ROUGE2.0 tool).
The system assumes that the structure of news articles follows an "inverted pyramid" style, meaning that the first sentences of an article generally contain the most critical information, which the system prioritizes during extraction.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

