Masterarbeit, 2019
108 Seiten, Note: 1,0
The main objective of this thesis is to investigate whether Natural Language Processing (NLP) and statistical topic modeling, specifically Latent Dirichlet Allocation (LDA), can reveal how song lyrics across different genres have changed over the past 50 years. The thesis aims to determine the similarity between song lyrics of various genres, explore potential decreases in lyrical complexity over time, and analyze the evolution of lyrical topics using LDA and similarity measures.
1. Introduction to the Topic: This chapter introduces the central research question: Can NLP and statistical topic modeling determine changes in song lyrics across genres over 50 years? It motivates this question by referencing previous research indicating a potential decline in lyrical sophistication and introduces the thesis's goals: analyzing genre similarity using NLP and text mining; assessing lyrical complexity changes; and using LDA to compare topic similarity and change across time. The chapter concludes with a brief overview of relevant previous research and a preview of the hypotheses to be tested and methods employed.
2. Theory: This chapter lays the theoretical groundwork for the methods used in the thesis. It covers Music Information Retrieval (MIR), focusing on the underutilization of lyrics data; Natural Language Processing (NLP), explaining lemmatization and part-of-speech tagging; and Text Data Mining, detailing the application of n-grams and tf-idf. The core of the chapter details Latent Dirichlet Allocation (LDA) for topic modeling, including model tuning (perplexity, log-likelihood) and evaluation. Finally, it describes the similarity measures (Jensen-Shannon Divergence, Hellinger Distance, and Log Ratio) used to compare topic distributions.
3. Data: This chapter describes the creation of a custom dataset of song lyrics. It details the selection of five genres (alternative, country, pop, rock, and hip-hop) and a time span (1970-2018), the web scraping process used to gather data from Discogs.com and Wikipedia.com, and the subsequent data pre-processing steps including tokenization, lemmatization, and stop word removal. It concludes with a summary of the final dataset, including statistics on song count, word count, and other relevant variables.
4. Analyses: This chapter presents the data analysis conducted to test the hypotheses. It begins with an examination of text statistics (song length, word length, lexical diversity, lexical density, word frequencies, log odds ratios) to explore differences and similarities between genres and across decades, as well as to assess changes in lyrical complexity. Then, it moves to analyzing text features like tf-idf, parts of speech, and n-grams (for repetition analysis). Finally, the chapter presents a comprehensive analysis of LDA modeling, detailing parameter tuning, model evaluation, and the use of similarity measures to compare topic distributions within and between models (across genres and decades).
This thesis investigates whether Natural Language Processing (NLP) and Latent Dirichlet Allocation (LDA) topic modeling can reveal how song lyrics across different genres have changed over the past 50 years. It aims to determine the similarity between song lyrics of various genres, explore potential decreases in lyrical complexity over time, and analyze the evolution of lyrical topics.
The main objectives are to analyze the similarity of song lyrics across different genres, assess changes in lyrical complexity over time, explore the evolution of lyrical topics across genres and decades, apply and evaluate NLP techniques in analyzing song lyrics, and determine optimal parameter settings for LDA topic modeling.
A custom dataset of song lyrics was created, encompassing five genres (alternative, country, pop, rock, and hip-hop) spanning from 1970 to 2018. Data was gathered through web scraping from Discogs.com and Wikipedia.com, followed by pre-processing steps such as tokenization, lemmatization, and stop word removal.
The analysis employed several methods including: Natural Language Processing (NLP) techniques (lemmatization, part-of-speech tagging), text mining techniques (n-grams, TF-IDF), Latent Dirichlet Allocation (LDA) for topic modeling, and similarity measures (Jensen-Shannon Divergence, Hellinger Distance, Log Ratio) to compare topic distributions. Text statistics (song length, word length, lexical diversity, etc.) were also analyzed.
Key themes include the similarity of song lyrics across genres, changes in lyrical complexity over time, the evolution of lyrical topics, the application and evaluation of NLP techniques in analyzing song lyrics, and the optimization of LDA topic modeling parameters.
The preview does not detail specific findings, but it indicates that the analysis will compare the findings with previous research and discuss areas for improvement and future applications of the methodology.
The thesis is structured into five chapters: 1. Introduction, 2. Theory (covering MIR, NLP, Text Mining, LDA, and similarity measures), 3. Data (data selection, scraping, and pre-processing), 4. Analyses (text statistics, text features, LDA modeling), and 5. Findings and Prospects.
The thesis utilizes lemmatization and part-of-speech tagging as core NLP techniques.
Latent Dirichlet Allocation (LDA) is the primary topic modeling technique used.
The study employs Jensen-Shannon Divergence, Hellinger Distance, and Log Ratio to compare topic distributions.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!
Kommentare