Who is Speaking? Male or Female

Masterarbeit, 2013
79 Seiten, Note: A+

Informatik - Sonstiges

Leseprobe

1 Introduction

2 Background

2.1 Speech

2.1.1 Speech Signal

2.2 Speech Signal Processing

2.2.1 Fourier Transform

2.2.2 Discrete Cosine Transform

2.2.3 Digital Filters

2.2.4 Nyquist Shannon Sampling Theorem

2.2.5 Window Functions

3 Speech Enhancement

3.1 Signal to Noise Ratio

3.2 Spectral Subtraction

3.3 Cepstral Mean Normalization

3.4 RASTA Filtering

3.5 Voice Activity Detector

3.5.1 The Empirical Mode Decomposition Method

3.5.2 The Hilbert Spectrum Analysis

3.5.3 Voice Activity Detection

4 Gender Identification Systems

4.1 Acoustic Features

4.1.1 Mel Frequency Cepstral Coefficients (MFCC)

4.1.2 Shifted Delta Cepstral (SDC)

4.1.3 Pitch Extraction Method

4.2 Pitch Based Models

4.3 Models based on Acoustic Features

4.4 Fused Models

5 Learning Techniques for Gender Identification

5.1 Overview

5.2 Adaboost

5.3 Gaussian Mixture Model (GMM)

5.3.1 GMM Training

5.3.2 GMM Testing

5.4 Decision Making

5.5 Likelihood Ratio

5.6 Universal Background Model

5.6.1 UBM Training

6 System Design and Implementation

6.1 Toolboxes

6.1.1 Signal Processing Toolbox

6.1.2 Machine Learning Toolbox

6.2 System Design

6.2.1 Requirement

6.2.2 Initial Approach

6.2.3 Algorithm

6.2.4 Feature Selection

6.3 Experiments and Results

6.3.1 Pitch Based Models

6.3.2 Models Based on Acoustic Features

6.3.3 Fused Model

6.3.4 YouTube Videos

7 Conclusion

7.1 Summary

7.2 Future Recommendation

Project Goals and Research Topics

This project aims to develop an automatic gender identification system using speech data, addressing the challenges posed by real-world acoustic conditions such as background noise and silence. The research explores and implements various classification techniques and feature extraction methods, specifically comparing pitch-based approaches, acoustic feature models (MFCC and SDC), and hybrid fusion models to achieve robust performance.

Automatic Gender Identification (AGI) methodology
Speech enhancement techniques for noise reduction
Acoustic feature extraction using MFCC and SDC
Comparison of Gaussian Mixture Models (GMM) and Support Vector Machines (SVM)
Performance evaluation through cross-validation and real-world simulation

Excerpt from the Book

6.3.1 Pitch Based Models

Pitch based models are those models which use only pitch as discriminating factor to identify the gender of the speaker. For training the model, the data was prepared by applying the pre-processing explained earlier. After that pitch for every frame was estimated using the harmonic to sub harmonic method and the average of pitch was used as a key. Using the training data I trained non-linear SVM using the RBF kernel to identify the gender of the speech. For testing, the pitch was estimated using the same method and the mean of the pitch was passed to the model as an input for classification.

I performed different types of experiments to evaluate the performance of the pitch based models. The primary motivation behind these experiments was to examine the factor of speaker variability in pitch based gender identification model. The secondary motivation behind performing these experiments was to determine the training settings at which pitch based models perform highest and to understand the behaviour of pitch based models when trained with speeches from different languages in different conditions. The length of each speech file is between 1.5 second and 4 seconds.

Summary of Chapters

1 Introduction: Discusses the significance of human-machine voice interaction and the motivation for developing automatic gender identification systems.

2 Background: Provides the theoretical foundation regarding speech signals, digital signal processing, sampling theorems, and window functions.

3 Speech Enhancement: Details pre-processing techniques like signal-to-noise ratio estimation, spectral subtraction, and voice activity detection to ensure robust feature extraction.

4 Gender Identification Systems: Explores acoustic features like MFCC and SDC, and reviews various model architectures including pitch-based and fused systems.

5 Learning Techniques for Gender Identification: Examines classification algorithms such as Support Vector Machines, Adaboost, and Gaussian Mixture Models used in gender classification.

6 System Design and Implementation: Outlines the development process, including toolboxes, datasets, feature selection, and the experimental results of different models.

7 Conclusion: Summarizes the project findings, highlighting the success of the SDC fused model, and suggests future improvements for the system.

Keywords

Gender Identification, Speech Processing, MFCC, SDC, Pitch Extraction, Gaussian Mixture Model, SVM, Adaboost, Spectral Subtraction, Voice Activity Detection, Signal Processing, Feature Extraction, Robustness, MATLAB, Digital Signal Analysis

Frequently Asked Questions

What is the primary focus of this research?

The research focuses on building an automatic gender identification system that can accurately determine a speaker's gender using speech features in real-world, noisy environments.

What are the core technical fields involved?

The work primarily integrates Digital Signal Processing (DSP) and Machine Learning techniques to analyze human speech.

What is the core objective or research question?

The goal is to create a robust gender identification system that maintains high accuracy despite acoustic disturbances like background noise, dialects, and different languages.

Which scientific methods were employed?

The author employed GMM-based acoustic modeling, SDC feature extraction, pitch-based discrimination, and SVM classification, evaluating them individually and in combination (fused models).

What does the main body cover?

It covers the theoretical background, specific speech enhancement methods, system architecture, training algorithms, and comprehensive performance testing against various models and datasets.

How would you characterize the work using keywords?

Key terms include Gender Identification, SDC (Shifted Delta Cepstral), MFCC, GMM, SVM, and robust speech enhancement.

Why are SDC features considered more effective than simple MFCC?

SDC features are long-term features that capture more contextual information from the speech signal, making them significantly more robust than short-term MFCC features in noisy conditions.

What was the result of testing the fused models on YouTube videos?

The fused models demonstrated exceptional performance, achieving up to 96% accuracy, validating the effectiveness of combining acoustic features with pitch information for real-world scenarios.

How was the training dataset constructed?

The dataset was created by collecting speech samples from 12 different languages (e.g., English, Urdu, Arabic, Spanish) from diverse sources, ensuring language and dialect independence.

Ende der Leseprobe aus 79 Seiten - nach oben

Details

Titel: Who is Speaking? Male or Female
Hochschule: University of Manchester
Note: A+
Autor: Hassam Sheikh (Autor:in)
Erscheinungsjahr: 2013
Seiten: 79
Katalognummer: V265700
ISBN (eBook): 9783656554363
ISBN (Buch): 9783656554493
Dateigröße: 1346 KB
Sprache: Englisch
Schlagworte: speaking male female
Produktsicherheit: GRIN Publishing GmbH
Preis (Ebook): US$ 40,99
Preis (Book): US$ 52,99

Arbeit zitieren: Hassam Sheikh (Autor:in), 2013, Who is Speaking? Male or Female, München, Page::Imprint:: GRINVerlagOHG, https://www.diplomarbeiten24.de/document/265700