Magisterarbeit, 2011
105 Seiten, Note: Very Good
CHAPTER ONE
INTRODUCTION
1.1. GENERAL BACKGROUND
1.2. STATEMENT OF THE PROBLEM
1.3. OBJECTIVE OF THE STUDY
1.3.1 GENERAL OBJECTIVE
1.3.2. SPECIFIC OBJECTIVES
1.4. METHODOLOGY
1.4.1.STUDY DESIGN
1.4.2.LITERATURE REVIEW
1.4.3. DATA SOURCES AND DATA PREPARATION FOR THE EXPERIMENT
UNDERSTANDING OF DOMAIN LANGUAGE
1.4.4. DESIGN AND IMPLEMENTATION OF AVATIES
1.5. APPLICATION OF RESULTS AND BENEFICIARIES
1.6. SCOPE AND LIMITATIONS OF THE STUDY
1.7. ORGANIZATION OF THE STUDY
CHAPTER TWO
LITERATURE REVIEW
2.1. INTRODUCTION
2.2. INFORMATION EXTRACTION (IE)
2.3. BUILDING INFORMATION EXTRACTION SYSTEMS
I. KNOWLEDGE ENGINEERING APPROACH
II. AUTOMATIC TRAINING APPROACH
2.4. ARCHITECTURE OF INFORMATION EXTRACTION SYSTEM
2.5. PREPROCESSING OF INPUT TEXTS
2.6. LEARNING AND APPLICATION OF THE EXTRACTION MODEL
2.7. POST PROCESSING OF OUTPUT
2.8. RELATED NLP FIELDS TO INFORMATION EXTRACTION
2.8.1. INFORMATION RETRIEVAL (IR)
2.8.2. TEXT SUMMARIZATION
2.8.3. QUESTION ANSWERING SYSTEMS
2.8.4. TEXT CATEGORIZATION
2.9. INFORMATION EXTRACTION (IE) AND INFORMATION RETRIEVAL (IR)
2.10. EVALUATION OF INFORMATION EXTRACTION
2.11. RELATED WORKS
INFORMATION EXTRACTION FOR E-JOB MARKETPLACE
INFORMATION EXTRACTION FROM AMHARIC TEXT
INFORMATION EXTRACTION FROM ENGLISH TEXT
INFORMATION EXTRACTION FROM CHINESE TEXT
CHAPTER THREE
THE AMHARIC WRITING SYSTEM
3.1. INTRODUCTION
3.2. AMHARIC CHARACTER REPRESENTATION AND WRITING SYSTEM
3.3. AMHARIC PUNCTUATION MARKS AND NUMERALS
3.4. CHARACTERISTICS OF THE AMHARIC WRITING SYSTEM
3.5. THE MORPHOLOGY OF AMHARIC
3.6. GRAMMATICAL STRUCTURE OF AMHARIC
3.6.1 WORD CATEGORIZATION IN AMHARIC
3.7. SENTENCES IN AMHARIC
CHAPTER FOUR
DESIGN AND IMPLEMENTATION OF AVATIES
4.1. INTRODUCTION
4.2. PROPOSED MODEL
DATA PREPROCESSING
LEARNING AND EXTRACTION COMPONENT
POST PROCESSING
THE PROTOTYPE SYSTEM
CHAPTER FIVE
RESULT AND EVALUATION
5.1. INTRODUCTION
5.2. EVALUATION METRICS
5.3. THE DATASETS
5.4. EXPERIMENTAL RESULT AND EVALUATION EACH COMPONENT OF OUR SYSTEM
5.4.1. EXPERIMENTAL RESULT AND EVALUATION OF NORMALIZATION
5.4.2. EXPERIMENTAL RESULT AND EVALUATION OF STOPWORD REMOVAL
5.4.3. EXPERIMENTAL RESULT AND EVALUATION OF TRANSLITERATION
5.4.5. EXPERIMENTAL RESULT AND EVALUATION OF PROTOTYPE SYSTEM FOR CANDIDATE TEXT EXTRACTION
5.4.5.1. EXPERIMENTAL RESULT AND EVALUATION OF ORGANIZATION AND POSITION EXTRACTION
5.4.5.2. EXPERIMENTAL RESULT AND EVALUATION OF OTHER CANDIDATE TEXT EXTRACTION
CHAPTR SIX
CONCLUSION AND RECOMMENDATION
6.1. CONCLUSIONS
6.2. RECOMMENDATION
REFERENCE
The primary objective of this research is to design and implement an automated Information Extraction (IE) system for Amharic vacancy announcement texts. The study aims to overcome the challenges of manually processing unstructured job postings by developing a rule-based system capable of accurately identifying and extracting key organizational and job-related data.
1.1. GENERAL BACKGROUND
Rapid developments in Information and Communication Technology are making available huge amount of data and information. Much of these data is in electronics forms (like more than billion documents in the World Wide Web). Usually these data are unstructured or semi-structured and can generally be considered as a text database. Likewise, the recent decades witnessed a rapid proliferation of Amharic textual information available in digital form in a myriad of repositories on the Internet and intranets. As a result of this growth, a huge amount of valuable information, which can be used in education, business, health and other many areas are hidden under unstructured representation of the textual data and is thus hard to search in. This resulted in a growing need for effective and efficient techniques for analyzing free-text data and discovering valuable and relevant knowledge from it in the form of structured information, and led to the emergence of Information Extraction technologies.
Information Extraction (IE) is one of the NLP applications that aim to automatically extract structured factual from unstructured text. Riloff [2] discusses, the task of automatic extraction of information from texts involves identify a predefined set of concepts and deciding whether a text is relevant for a certain domain, and if so extracting a set of facts from that text.
IE has three different components regardless of the language and domain on which it is developed for. The components are linguistic preprocessing, learning and application, and post processing. Linguistic preprocessing uses different tools to make the natural language texts ready for extraction. The learning and the application component learns a model and extract the required information from the preprocessed text.
CHAPTER ONE: Provides an introduction to the research, outlining the background, problem statement, objectives, and the methodology used to develop the IE system.
CHAPTER TWO: Reviews the literature on Information Extraction (IE) techniques, related NLP fields, and existing IE systems, providing a foundation for the proposed approach.
CHAPTER THREE: Discusses the Amharic writing system, morphology, and grammatical structure, highlighting language-specific challenges relevant to the research.
CHAPTER FOUR: Details the design and implementation of the AVATIES prototype, including the proposed model, preprocessing steps, and extraction algorithms.
CHAPTER FIVE: Presents the experimental results and performance evaluation of the system, using precision, recall, and F-measure metrics on the collected test dataset.
CHAPTR SIX: Concludes the thesis by summarizing key findings and providing recommendations for future research and system improvements.
Information Extraction, Amharic Language, Vacancy Announcement, Rule-Based System, Natural Language Processing, Tokenization, Normalization, Named Entity Recognition, Gazetteer, Morphology, POS Tagging, Prototype System, Precision, Recall, F-measure.
The research focuses on designing an automated Information Extraction system specifically for Amharic vacancy announcements to reduce the manual effort of extracting job-related details from unstructured newspaper text.
Key themes include natural language processing for the Amharic language, rule-based IE modeling, linguistic preprocessing, and system performance evaluation within the specific domain of job postings.
The study seeks to determine the most effective approaches, algorithms, and models for designing an Amharic IE system capable of accurately identifying and extracting relevant data from unstructured vacancy announcements.
The research employs an experimental methodology, involving the collection of Amharic vacancy texts, development of rule-based algorithms, and testing the system's performance using standard NLP metrics like precision and recall.
The main body covers the literature review of IE systems, an analysis of the Amharic language structure (writing system and morphology), the technical design of the AVATIES prototype, and a thorough performance evaluation.
The study is characterized by terms such as Information Extraction, Amharic Language, Vacancy Announcement, Rule-Based System, Natural Language Processing, and Prototype System.
Amharic has a unique syllabic script where characters exhibit spelling variations due to interchangeably used consonants that share the same sound, requiring robust normalization algorithms for effective extraction.
The system uses a combination of gazetteers (predefined lists) and feature-based context rules to identify entities like job position and organization name regardless of the specific format of the vacancy text.
The prototype system achieved an overall F-measure of 71.7%, demonstrating that a rule-based knowledge engineering approach is a promising direction for Amharic information extraction.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

