Automatic Cross-Target Stance Detection With Fine-Tuned BERT

An Examination of German Twitter Data on the Russian-Ukrainian Conflict 2022

Masterarbeit, 2023
131 Seiten, Note: 1,3

Informatik - Computerlinguistik

Leseprobe

1 Introduction

1.1 Research Objectives

1.2 Thesis Outline

2 Background

2.1 The Russian-Ukrainian Conflict 2022

2.1.1 Foreign and Security Policy

2.1.2 Energy Crisis

2.2 Twitter

2.3 Stance Detection

2.3.1 The Task of Stance Detection

2.3.2 Types of Stance Detection

2.3.3 Related Work

2.4 The Language Model BERT

2.4.1 Methodology

2.4.2 Transformer Encoder

2.4.3 Pre-Training

2.4.4 Fine-Tuning

3 Dataset

3.1 Data Collection

3.2 Removal of Duplicates

3.3 Development of Balanced Class Distributions

3.4 Data Statistics

3.5 Manually Labeled Test Datasets

3.6 Final Dataset and Data Availability

4 Experiments

4.1 Experimental Setup

4.1.1 Pre-Trained Language Models

4.1.2 Preprocessing Methodology

4.1.3 Evaluation Metrics

4.2 Experiments and Results

4.2.1 Experiment 1: Impact of a Balanced Dataset

4.2.2 Experiment 2: Cross-Target Generalization

4.2.3 Experiment 3: Different BERT Models

4.2.4 Hyperparameter

4.2.5 Discussion

5 Application of Fine-Tuned Model on 2022 Twitter Data

5.1 Twitter Data of 2022

5.2 Statistics of Detected Stances

5.3 Potential Reasons of Target-Specific Stance Groups

5.3.1 Target NOC

5.3.2 Target SLI

5.3.3 Target AD

5.3.4 Target US

5.4 Summary and Evolution of Tweet Volume Over Time

6 Conclusions and Outlook

Research Objectives and Core Themes

This master thesis aims to develop an automatic stance detection model by fine-tuning BERT in a supervised cross-target setting. By applying this model to a large corpus of German Twitter data, the study examines stances regarding controversial socio-political debates arising from the 2022 Russian-Ukrainian conflict.

Development of an automatic cross-target stance detection system.
Analysis of German Twitter users' opinions on four targets concerning the Russian-Ukrainian conflict.
Evaluation of fine-tuning techniques, including the impact of class balancing and case-sensitivity.
Investigation into the model's ability to generalize across different, domain-related targets.
Examination of potential reasons for target-specific stance groups using word frequency and context analysis.

Excerpt from the Book

2.3 Stance Detection

The automatic extraction and analysis of information from texts has been an important research area in NLP for decades. Along with sentiment analysis, emotion recognition, and textual entailment, stance detection is an important research problem regarding the automatic analysis of content and can be viewed as a subtask of opinion mining.

2.3.1 The Task of Stance Detection

“Stance is a public act by a social actor, achieved dialogically through overt communicative means, of simultaneously evaluating objects, positioning subjects (self and others), and aligning with other subjects, with respect to any salient dimension of sociocultural field.” (Du Bois, 2007, p. 163)

According to Linguistics, the term stance refers to a social act by which someone takes a position in an ongoing communication itself in terms of evaluation, intentionality, epistemology, or social relations. By that, a person is taking a stance, whenever he or she describes an object (hereafter referred to as a target) in a way that expresses his or her attitude to it.

Accordingly, in NLP, stance detection refers to the task of automatically detecting the attitude expressed in a natural language text towards a target. Hence, a basic stance detection system should detect whether the author of a text is against or in favor of a given target entity. There are also many cases where a third or more classes are defined, such as neutral, where neither inference is likely. The target entity may be a person (e.g., Olaf Scholz) or an organization (e.g., North Atlantic Treaty Organization), a product (e.g., iPhone), a claim or headline (e.g., COVID-19 vaccines affect fertility in women) or any topic such as a political movement or a government policy (e.g., legalization of cannabis).

Summary of Chapters

1 Introduction: Provides an overview regarding the conflict setting and the research objectives of the development of a stance detection system.

2 Background: Discusses the Russian-Ukrainian conflict 2022, the significance of Twitter as a platform, the theory of stance detection, and the language model BERT.

3 Dataset: Details the collection of tweets, the cleaning process (removal of duplicates), and the development of balanced class distributions using back translation.

4 Experiments: Explains the setup of the experiments, model evaluations across targets, and the assessment of performance regarding balanced datasets, generalization, and different BERT models.

5 Application of Fine-Tuned Model on 2022 Twitter Data: Applies the developed model to a larger 2022 dataset to examine stance distributions and explore reasoning behind target-specific opinions.

6 Conclusions and Outlook: Summarizes the key findings of the thesis and discusses potential future research directions, such as incorporating multi-modal data.

Keywords

Stance Detection, Natural Language Processing, BERT, Cross-Target Stance Detection, Russian-Ukrainian Conflict, German Twitter Data, Transfer Learning, Machine Learning, Data Augmentation, Back Translation, Opinion Mining, Social Media Analysis, Fine-Tuning, Sentiment Analysis, Public Opinion.

Frequently Asked Questions

What is the core focus of this master thesis?

This thesis investigates the development of an automatic stance detection system for the German language, specifically applied to Twitter debates surrounding the 2022 Russian-Ukrainian conflict.

Which specific themes are addressed by the stance detection model?

The model analyzes four key targets: general support of Ukraine, delivery of heavy weapons to Ukraine, the repeal of the nuclear phase-out, and the implementation of a temporary speed limit on highways.

What is the primary goal of the research?

The goal is to leverage transfer learning by fine-tuning BERT models on multiple domain-related targets to create a system capable of predicting user stances even when the target might be indirectly referenced.

Which scientific methodology is utilized in this study?

The work employs deep learning, specifically fine-tuning pre-trained BERT language models. It uses a cross-target training approach combined with synthetic data augmentation (back translation) to handle dataset imbalances.

What topics are discussed in the main part of the thesis?

The main part covers the theoretical background of stance detection and BERT, detailed methodology for creating a balanced dataset from scraped Twitter posts, various experimental configurations, and an application study on 2022 data.

How is the effectiveness of the model defined and evaluated?

Effectiveness is evaluated via precision, recall, and F-score on two different test sets: one automatically labeled (Test-1) and one manually annotated by human testers (Test-2).

How well did the model perform across different targets?

Results show the model achieves reliable performance on known targets like Ukraine support, but struggles with cross-target generalization to completely unknown, unconventional domains.

Does the thesis address the role of irony and sarcasm in social media data?

Yes, the thesis highlights that irony and sarcasm pose significant challenges for automated stance detection, as these require deep contextual and background knowledge often missed by sentiment-bearing lexicon approaches.

Ende der Leseprobe aus 131 Seiten - nach oben

Details

Titel: Automatic Cross-Target Stance Detection With Fine-Tuned BERT
Untertitel: An Examination of German Twitter Data on the Russian-Ukrainian Conflict 2022
Hochschule: Universität Trier (Computerlinguistik und Digital Humanities)
Note: 1,3
Autor: Johanna Garthe (Autor:in)
Erscheinungsjahr: 2023
Seiten: 131
Katalognummer: V1431357
Sprache: Englisch
Schlagworte: BERT Stance Detection Cross-Target Stance Classification Opinion Mining Twitter Russland-Ukraine-Konflikt Russia Ukraine Sprachmodelle Computerlinguistik NLP Natural Language Processing Klassifizierung
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 42,99

Arbeit zitieren: Johanna Garthe (Autor:in), 2023, Automatic Cross-Target Stance Detection With Fine-Tuned BERT, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/1431357