Masterarbeit, 2019
108 Seiten, Note: 1,0
1. Introduction to the Topic
1.1. Previous Research
1.2. Hypotheses and Methods
2. Theory
2.1. Music Information Retrieval
2.2. Natural Language Processing
2.2.1. Lemmatization
2.2.2. Part-of-Speech Tagging
2.3. Text Data Mining
2.3.1. N-grams
2.3.2. Term Frequency-Inverse Document Frequency
2.4. Topic Modeling Using Latent Dirichlet Allocation
2.4.1. Model Tuning
2.4.2. Model Evaluation
2.5. Similarity Measures
2.5.1. Jensen-Shannon Divergence
2.5.2. Hellinger Distance
2.5.3. Log Ratio
3. Data
3.1. Data Selection and Web Scraping
3.2. Data Pre-Processing
3.3. The Final Data Set
4. Analyses
4.1. Text Statistics
4.1.1. Comparison of Text Statistics
4.1.2. Comparison of Word Use
4.2. Text Features
4.2.1. Term Frequency-Inverse Document Frequency in Application
4.2.2. Part-of-Speech Tagging in Application
4.2.3. N-grams in Application
4.2.4. Conclusions about Text Statistics and Text Features
4.3. LDA Modeling
4.3.1. Parameter Tuning
4.3.2. Model Evaluation
4.3.3. Topic Similarity Within Models
4.3.4. Topic Similarity Between Models
4.3.5. Conclusions about LDA Modeling and Similarity Measures
5. Findings and Prospects
5.1. Findings Compared to Previous Research
5.2. Need for Improvement and Future Applications
This thesis investigates the evolution, complexity, and thematic similarity of English song lyrics across five diverse musical genres (alternative, country, pop, rock, and hip-hop) over the past 50 years. The central research question seeks to determine if information derived from Natural Language Processing (NLP) and statistical topic modeling can accurately measure these changes over time.
1. Introduction to the Topic
While this thesis is aiming at the analysis of English music lyrics, an anecdote about German music business seems fit to introduce to this subject. It is partly targeted at the similarity of song lyrics which will mainly be explored here. The four most used topics in popular German music, according to satirist Jan Böhmermann, are “Menschen, Leben, Tanzen, Welt”. This is as well the title of a song “composed” by chimpanzees, using lines from German pop songs, tweets by popular influencers, advertising slogans, and proverbs (cf. Böhmermann 2017). Böhmermann performed it in order to criticize the music business in Germany, he is claiming that the lyrics of popular songs all are very similar, superficial, and sound like advertisement (cf. ibid., Rohleder 2018). Upon its commercial release the song hit the top 10 German single charts, which was symptomatic for the current state of popular German music, according to Stern magazine (cf. Stern 2017). Apart from this, there are actual analyses and scientific studies which seek to prove the deterioration of music lyrics. Bagot and Scott, for example, analyze about 6000 songs by top-selling UK artists, trying to determine the sophistication of their lyrics. Comparing the lyrics to readability scores, which are used to identify the level of difficulty of school literature, they specify how demanding the songs are. For instance, they discover that to be able to understand the average song by Depeche Mode, 10.3 years of school education are necessary, which makes them the artists with the most sophisticated lyrics among those that were examined.
1. Introduction to the Topic: This chapter introduces the motivation behind analyzing song lyrics and establishes the research goal of using statistical methods to explore lyrical evolution and complexity.
2. Theory: This section provides the theoretical foundation for Music Information Retrieval, NLP techniques, and the specific application of Latent Dirichlet Allocation for topic modeling.
3. Data: This chapter details the methodology for building a custom dataset, including web scraping from Discogs and Genius, and the subsequent pre-processing steps like tokenization and lemmatization.
4. Analyses: This chapter presents the empirical results of text statistics and feature comparisons, followed by the LDA modeling process and the evaluation of topic similarities within and between models.
5. Findings and Prospects: This concluding chapter synthesizes the results, compares them with previous research, and suggests improvements and future research directions for the field.
Natural Language Processing, Song Lyrics, Latent Dirichlet Allocation, Topic Modeling, Music Information Retrieval, Lexical Complexity, Text Mining, Jensen-Shannon Divergence, Hellinger Distance, Corpus Linguistics, Genre Classification, Data Scraping, Text Statistics, Temporal Analysis, Repetitiveness.
The thesis focuses on analyzing English song lyrics to track changes in complexity, thematic content, and stylistic similarity across five major music genres from 1970 to 2018.
The work explores lexical density, word use patterns, the evolution of lyrical complexity, and the automatic extraction of topics using machine learning.
The research asks if NLP and statistical topic modeling can effectively determine if and to what extent song lyrics of various genres have changed over the last 50 years.
The study employs Natural Language Processing for text features, including lemmatization, POS tagging, and n-grams, as well as Latent Dirichlet Allocation (LDA) for statistical topic modeling.
The main analysis evaluates text statistics, computes tf-idf weights, conducts parameter tuning for LDA models, and calculates similarity measures to compare topics across different genres and time periods.
The study is characterized by a self-designed, large-scale corpus of lyrics, the use of R for computational processing, and a multi-dimensional comparison of genres and decades.
Complexity is defined through lexical density (the ratio of unique words to total words), word lengths, and the amount of repetition within songs.
LDA is used as a probabilistic model to uncover the latent thematic structures of lyrics, allowing the author to label and compare the similarities of topics across the dataset.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

