Masterarbeit, 2013
79 Seiten, Note: A+
1 Introduction
2 Background
2.1 Speech
2.1.1 Speech Signal
2.2 Speech Signal Processing
2.2.1 Fourier Transform
2.2.2 Discrete Cosine Transform
2.2.3 Digital Filters
2.2.4 Nyquist Shannon Sampling Theorem
2.2.5 Window Functions
3 Speech Enhancement
3.1 Signal to Noise Ratio
3.2 Spectral Subtraction
3.3 Cepstral Mean Normalization
3.4 RASTA Filtering
3.5 Voice Activity Detector
3.5.1 The Empirical Mode Decomposition Method
3.5.2 The Hilbert Spectrum Analysis
3.5.3 Voice Activity Detection
4 Gender Identification Systems
4.1 Acoustic Features
4.1.1 Mel Frequency Cepstral Coefficients (MFCC)
4.1.2 Shifted Delta Cepstral (SDC)
4.1.3 Pitch Extraction Method
4.2 Pitch Based Models
4.3 Models based on Acoustic Features
4.4 Fused Models
5 Learning Techniques for Gender Identification
5.1 Overview
5.2 Adaboost
5.3 Gaussian Mixture Model (GMM)
5.3.1 GMM Training
5.3.2 GMM Testing
5.4 Decision Making
5.5 Likelihood Ratio
5.6 Universal Background Model
5.6.1 UBM Training
6 System Design and Implementation
6.1 Toolboxes
6.1.1 Signal Processing Toolbox
6.1.2 Machine Learning Toolbox
6.2 System Design
6.2.1 Requirement
6.2.2 Initial Approach
6.2.3 Algorithm
6.2.4 Feature Selection
6.3 Experiments and Results
6.3.1 Pitch Based Models
6.3.2 Models Based on Acoustic Features
6.3.3 Fused Model
6.3.4 YouTube Videos
7 Conclusion
7.1 Summary
7.2 Future Recommendation
This project aims to develop an automatic gender identification system using speech data, addressing the challenges posed by real-world acoustic conditions such as background noise and silence. The research explores and implements various classification techniques and feature extraction methods, specifically comparing pitch-based approaches, acoustic feature models (MFCC and SDC), and hybrid fusion models to achieve robust performance.
6.3.1 Pitch Based Models
Pitch based models are those models which use only pitch as discriminating factor to identify the gender of the speaker. For training the model, the data was prepared by applying the pre-processing explained earlier. After that pitch for every frame was estimated using the harmonic to sub harmonic method and the average of pitch was used as a key. Using the training data I trained non-linear SVM using the RBF kernel to identify the gender of the speech. For testing, the pitch was estimated using the same method and the mean of the pitch was passed to the model as an input for classification.
I performed different types of experiments to evaluate the performance of the pitch based models. The primary motivation behind these experiments was to examine the factor of speaker variability in pitch based gender identification model. The secondary motivation behind performing these experiments was to determine the training settings at which pitch based models perform highest and to understand the behaviour of pitch based models when trained with speeches from different languages in different conditions. The length of each speech file is between 1.5 second and 4 seconds.
1 Introduction: Discusses the significance of human-machine voice interaction and the motivation for developing automatic gender identification systems.
2 Background: Provides the theoretical foundation regarding speech signals, digital signal processing, sampling theorems, and window functions.
3 Speech Enhancement: Details pre-processing techniques like signal-to-noise ratio estimation, spectral subtraction, and voice activity detection to ensure robust feature extraction.
4 Gender Identification Systems: Explores acoustic features like MFCC and SDC, and reviews various model architectures including pitch-based and fused systems.
5 Learning Techniques for Gender Identification: Examines classification algorithms such as Support Vector Machines, Adaboost, and Gaussian Mixture Models used in gender classification.
6 System Design and Implementation: Outlines the development process, including toolboxes, datasets, feature selection, and the experimental results of different models.
7 Conclusion: Summarizes the project findings, highlighting the success of the SDC fused model, and suggests future improvements for the system.
Gender Identification, Speech Processing, MFCC, SDC, Pitch Extraction, Gaussian Mixture Model, SVM, Adaboost, Spectral Subtraction, Voice Activity Detection, Signal Processing, Feature Extraction, Robustness, MATLAB, Digital Signal Analysis
The research focuses on building an automatic gender identification system that can accurately determine a speaker's gender using speech features in real-world, noisy environments.
The work primarily integrates Digital Signal Processing (DSP) and Machine Learning techniques to analyze human speech.
The goal is to create a robust gender identification system that maintains high accuracy despite acoustic disturbances like background noise, dialects, and different languages.
The author employed GMM-based acoustic modeling, SDC feature extraction, pitch-based discrimination, and SVM classification, evaluating them individually and in combination (fused models).
It covers the theoretical background, specific speech enhancement methods, system architecture, training algorithms, and comprehensive performance testing against various models and datasets.
Key terms include Gender Identification, SDC (Shifted Delta Cepstral), MFCC, GMM, SVM, and robust speech enhancement.
SDC features are long-term features that capture more contextual information from the speech signal, making them significantly more robust than short-term MFCC features in noisy conditions.
The fused models demonstrated exceptional performance, achieving up to 96% accuracy, validating the effectiveness of combining acoustic features with pitch information for real-world scenarios.
The dataset was created by collecting speech samples from 12 different languages (e.g., English, Urdu, Arabic, Spanish) from diverse sources, ensuring language and dialect independence.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

