How Did English Songs Evolve? Retrieving Information from Song Lyrics Via Natural Language Processing and Statistical Topic Modeling

Masterarbeit, 2019
108 Seiten, Note: 1,0

Mathematik - Statistik

Leseprobe

1. Introduction to the Topic

1.1. Previous Research

1.2. Hypotheses and Methods

2. Theory

2.1. Music Information Retrieval

2.2. Natural Language Processing

2.2.1. Lemmatization

2.2.2. Part-of-Speech Tagging

2.3. Text Data Mining

2.3.1. N-grams

2.3.2. Term Frequency-Inverse Document Frequency

2.4. Topic Modeling Using Latent Dirichlet Allocation

2.4.1. Model Tuning

2.4.2. Model Evaluation

2.5. Similarity Measures

2.5.1. Jensen-Shannon Divergence

2.5.2. Hellinger Distance

2.5.3. Log Ratio

3. Data

3.1. Data Selection and Web Scraping

3.2. Data Pre-Processing

3.3. The Final Data Set

4. Analyses

4.1. Text Statistics

4.1.1. Comparison of Text Statistics

4.1.2. Comparison of Word Use

4.2. Text Features

4.2.1. Term Frequency-Inverse Document Frequency in Application

4.2.2. Part-of-Speech Tagging in Application

4.2.3. N-grams in Application

4.2.4. Conclusions about Text Statistics and Text Features

4.3. LDA Modeling

4.3.1. Parameter Tuning

4.3.2. Model Evaluation

4.3.3. Topic Similarity Within Models

4.3.4. Topic Similarity Between Models

4.3.5. Conclusions about LDA Modeling and Similarity Measures

5. Findings and Prospects

5.1. Findings Compared to Previous Research

5.2. Need for Improvement and Future Applications

Research Objectives and Themes

This thesis investigates the evolution, complexity, and thematic similarity of English song lyrics across five diverse musical genres (alternative, country, pop, rock, and hip-hop) over the past 50 years. The central research question seeks to determine if information derived from Natural Language Processing (NLP) and statistical topic modeling can accurately measure these changes over time.

Application of NLP methods for comparative analysis of lyrical text statistics and features.
Examination of lexical complexity and sophistication trends in song lyrics across decades.
Implementation of Latent Dirichlet Allocation (LDA) to compute statistical topic models for song collections.
Evaluation of genre-specific thematic similarities using advanced distance and similarity measures.

Excerpt from the Book

1. Introduction to the Topic

While this thesis is aiming at the analysis of English music lyrics, an anecdote about German music business seems fit to introduce to this subject. It is partly targeted at the similarity of song lyrics which will mainly be explored here. The four most used topics in popular German music, according to satirist Jan Böhmermann, are “Menschen, Leben, Tanzen, Welt”. This is as well the title of a song “composed” by chimpanzees, using lines from German pop songs, tweets by popular influencers, advertising slogans, and proverbs (cf. Böhmermann 2017). Böhmermann performed it in order to criticize the music business in Germany, he is claiming that the lyrics of popular songs all are very similar, superficial, and sound like advertisement (cf. ibid., Rohleder 2018). Upon its commercial release the song hit the top 10 German single charts, which was symptomatic for the current state of popular German music, according to Stern magazine (cf. Stern 2017). Apart from this, there are actual analyses and scientific studies which seek to prove the deterioration of music lyrics. Bagot and Scott, for example, analyze about 6000 songs by top-selling UK artists, trying to determine the sophistication of their lyrics. Comparing the lyrics to readability scores, which are used to identify the level of difficulty of school literature, they specify how demanding the songs are. For instance, they discover that to be able to understand the average song by Depeche Mode, 10.3 years of school education are necessary, which makes them the artists with the most sophisticated lyrics among those that were examined.

Summary of Chapters

1. Introduction to the Topic: This chapter introduces the motivation behind analyzing song lyrics and establishes the research goal of using statistical methods to explore lyrical evolution and complexity.

2. Theory: This section provides the theoretical foundation for Music Information Retrieval, NLP techniques, and the specific application of Latent Dirichlet Allocation for topic modeling.

3. Data: This chapter details the methodology for building a custom dataset, including web scraping from Discogs and Genius, and the subsequent pre-processing steps like tokenization and lemmatization.

4. Analyses: This chapter presents the empirical results of text statistics and feature comparisons, followed by the LDA modeling process and the evaluation of topic similarities within and between models.

5. Findings and Prospects: This concluding chapter synthesizes the results, compares them with previous research, and suggests improvements and future research directions for the field.

Keywords

Natural Language Processing, Song Lyrics, Latent Dirichlet Allocation, Topic Modeling, Music Information Retrieval, Lexical Complexity, Text Mining, Jensen-Shannon Divergence, Hellinger Distance, Corpus Linguistics, Genre Classification, Data Scraping, Text Statistics, Temporal Analysis, Repetitiveness.

Frequently Asked Questions

What is the primary focus of this thesis?

The thesis focuses on analyzing English song lyrics to track changes in complexity, thematic content, and stylistic similarity across five major music genres from 1970 to 2018.

What are the core thematic areas?

The work explores lexical density, word use patterns, the evolution of lyrical complexity, and the automatic extraction of topics using machine learning.

What is the central research question?

The research asks if NLP and statistical topic modeling can effectively determine if and to what extent song lyrics of various genres have changed over the last 50 years.

Which scientific methods are utilized?

The study employs Natural Language Processing for text features, including lemmatization, POS tagging, and n-grams, as well as Latent Dirichlet Allocation (LDA) for statistical topic modeling.

What is covered in the main analysis part?

The main analysis evaluates text statistics, computes tf-idf weights, conducts parameter tuning for LDA models, and calculates similarity measures to compare topics across different genres and time periods.

What are the key descriptive characteristics of this study?

The study is characterized by a self-designed, large-scale corpus of lyrics, the use of R for computational processing, and a multi-dimensional comparison of genres and decades.

How is the "complexity" of lyrics defined in this work?

Complexity is defined through lexical density (the ratio of unique words to total words), word lengths, and the amount of repetition within songs.

What is the role of Latent Dirichlet Allocation (LDA) here?

LDA is used as a probabilistic model to uncover the latent thematic structures of lyrics, allowing the author to label and compare the similarities of topics across the dataset.

Ende der Leseprobe aus 108 Seiten - nach oben

Details

Titel: How Did English Songs Evolve? Retrieving Information from Song Lyrics Via Natural Language Processing and Statistical Topic Modeling
Hochschule: Otto-Friedrich-Universität Bamberg (Statistik und Ökonometrie)
Note: 1,0
Autor: Laura Zapf (Autor:in)
Erscheinungsjahr: 2019
Seiten: 108
Katalognummer: V997210
ISBN (eBook): 9783346376244
ISBN (Buch): 9783346376251
Sprache: Englisch
Schlagworte: natural language processing machine learning topic modelling statistics
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 42,99
Preis (Book): US$ 54,99

Arbeit zitieren: Laura Zapf (Autor:in), 2019, How Did English Songs Evolve? Retrieving Information from Song Lyrics Via Natural Language Processing and Statistical Topic Modeling, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/997210