Masterarbeit, 2019
109 Seiten, Note: 89
Chapter I
1.1 Introduction
1.2 Problem Statement
1.3 Purpose of the Study
1.4 Research Questions
1.5 Research Hypotheses
1.6 Significance of the Study
1.7 Limitations of the Study
1.8 Challenges of the study
1.9 Research contributions
1.10 Key Terms
1.10.1 Sentiment Analysis
1.10.2 Natural Language Processing (NLP)
1.10.3 Arabizi NLP
1.10.4 Classifier
1.10.5 Big Data
1.10.6 Machine Learning Classifier
1.10.7 Lexicon-based Classifier
1.10.8 Customer Review
1.11 Research Outline
Chapter II
2.1 Literature Review
2.2 Natural Language Processing (NLP)
2.3 Big Data and Sentiment Analysis (SA)
2.4 Approaches to SA
2.4.1 Lexicon-Based Approach
2.4.2 Machine Learning Approach
2.4.3 Hybrid Approach
2.5 Arabizi and the Lebanese Dialect
2.6 Sentiment Analysis and Lebanese Arabizi
Chapter III
3.1 Research Methodology
3.2 Research Design
3.3 Research Sample
3.3.1 The Challenges of Analyzing Arabizi Texts
3.4 Data Preprocessing and Filtering
3.4.1 Removal of reviews with “neutral” sentiment
3.4.2 Ratings’ Encodings
3.4.3 Data splitting for training and testing
3.4.4 Data Cleaning
3.5 Reviews Representation
3.5.1 Selected Features
3.6 Research Tools
3.6.1 Machine Learning Classifier
3.6.2 Lexicon-based Classifier
Chapter IV
3.7 Research Procedure
Chapter IV
4.1 Experiment Preparation
4.2 Data Preprocessing
4.3 Feature Extraction
4.4 Building Classifiers
4.4.1 Machine Learning
4.4.2 Lexicon-based
4.5 Results and Evaluation
Chapter V
5.1 Research Result
5.2 Machine Learning
5.2.1 First phase (Default settings)
5.2.2 Second phase (hyperparameters tuning settings)
5.2.3 Experiment Summary
5.3 Lexicon-based
5.3.1 Experiment Summary
5.4 Discussion
Chapter VI
6.1 Conclusion
6.2 Future Work
This study aims to develop and deploy an efficient sentiment analysis model tailored for the Arabizi language system, specifically within the context of customer service reviews in Lebanon. By leveraging both supervised machine learning (using Logistic Regression) and a lexicon-based approach (using the Science of Language and Communication Semantic Analysis System), the research addresses the lack of automated processing tools for this informal, Latin-scripted dialect.
1.1 Introduction
Nowadays, the huge flow of unstructured (unlabeled) data of about forty thousand exabytes that speculated to reach in the early 2020 (Gantz and Reinsel, 2012), with the presence of the World Wide Web (WWW), has attracted a large number of data mining researchers for the aim of extracting vivid knowledge and other useful information for making sense of what the people feel and reckon in the virtual space (Waters, 2010). For such big data analysis, Sentiment Analysis (SA) or opinion mining (OM) is a major concern for opinion analytic and extraction from sequences of texts in forms of reviews, discussions, and blogs (Pang and Lee, 2008). SA is one of multidisciplinary research field that includes NLP, Computational Linguistics (CL), Information Retrieval (IR) or Extraction (IE), ML, DL, and Artificial Intelligence (AI) (Feldman, 2013). Concerning emotion understanding and identification in the depth of Computer-mediated Communication (CMC), SA is most practical and useful to carry on because it fills the gap between machine’s understanding and human natural language by giving it the ability to identify and grasp sentimental information through written expressions associated within the big data by classifying and processing language and utterance into one of SA predefined classes, for example, positive, neutral, or negative one (Duwairi et al., 2016).
One of the most popular and used social media application in the Arab world is Facebook. It shows a continuous increase in its users, reaching to about 116 and a half million in the Middle Eastern countries, and specifically 360 thousand in Lebanon solely at the beginning of 2018 (“Middle East Internet Statistics”, 2018). Accordingly, users generate continues flow of data in every day’s basis that are characterized as growing mountains fueled with opinions: reviews, ratings, recommendations, and other useful information (Wright, 2009), especially on public and private services including food, education, hotel, resort, product, shop, and restaurant, etc. (Agarwal et al., 2015). However, various-shaped challenges associated within the folds of the generated big data while attempting to automatically process such in NLP tasks for the sake of knowledge-making and further decision-making in terms of data size, language dialect, and the complexity of linguistics form and nature (phonology, morphology, syntax, semantic and pragmatic) (Elgendy and Elragal, 2014).
Chapter I: This chapter provides an introduction to the research, defining the problem of lacking Arabizi sentiment analysis tools and outlining the study's research questions and objectives.
Chapter II: This chapter reviews the literature on Natural Language Processing (NLP), Big Data, and various sentiment analysis approaches, including machine learning and lexicon-based methods.
Chapter III: This chapter details the research methodology, including the design, data collection from social media, and the specific linguistic challenges associated with analyzing Arabizi texts.
Chapter IV: This chapter explains the research procedure, focusing on experiment preparation, data preprocessing, feature extraction, and the construction of both machine learning and lexicon-based classifiers.
Chapter V: This chapter presents the experimental results and evaluations, comparing the performance of different models based on precision, recall, and f1-scores.
Chapter VI: This chapter concludes the research, confirming the applicability of the proposed models and suggesting directions for future research in Arabizi natural language processing.
Sentiment Analysis, Natural Language Processing, Arabizi NLP, Classifier, Big Data, Machine Learning, Logistic Regression, Lexicon-based, Customer Review, Computational Linguistics, Opinion Mining, Feature Extraction, Data Preprocessing, Sentiment Classification, Arabic Dialects.
The thesis focuses on building and comparing sentiment analysis tools specifically for the Lebanese Arabizi language system, using customer reviews from public and private service sectors.
The main themes include Natural Language Processing (NLP), Arabizi dialect peculiarities, machine learning classification, lexicon-based rule construction, and big data management within the context of Lebanese customer feedback.
The goal is to bridge the gap in sentiment analysis for Arabizi by proposing an efficient automated classification system and creating a reliable dataset for future research.
The study utilizes an experimental, quantitative, and descriptive approach, specifically testing supervised machine learning (Logistic Regression) alongside a rule-based lexicon classifier (SLCSAS).
The main part of the paper details the data preprocessing techniques, the construction of BoW and TF*IDF features, the training of classification models, and the comprehensive evaluation of their performance.
Key terms include Sentiment Analysis, Arabizi NLP, Machine Learning, Logistic Regression, Lexicon-based Classification, and Opinion Mining.
Arabizi is an informal slang system used by Arabic speakers to write Arabic using English (Latin) characters, often encountered in computer-mediated communication like social media.
The author performed manual data collection and extraction from Facebook, Google, and Zomato, creating a unique dataset of 2635 reviews which were subsequently preprocessed for training and testing.
The SLCSAS (Science of Language and Communication Semantic Analysis System) uses manually hand-crafted grammar rules, a dictionary, and a semantic map to categorize sentiment based on specific keywords and linguistic markers found in the text.
The experiments demonstrated that hyperparameter tuning, particularly for the TF*IDF-based Logistic Regression model, significantly enhanced the classifier's performance, making it the most accurate solution presented in the study.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

