Diplomarbeit, 2006
117 Seiten, Note: 1.3
1. Introduction
2. The Classical Linear Ordinary Least Squares Regression
2.1. Introduction to OLS
2.2. Properties of the Least Squares Estimates
2.3. Problems of OLS
3. Outliers and OLS
3.1. Outlier definition and common error sources
3.2. Outlier in Regression Analysis and their influence on OLS results
4. The concept of Robustness
4.1. Introduction to Robustness
4.2. Qualitative Robustness
4.3. Infinitesimal Robustness
4.4. Quantitative Robustness
4.5. Robust Estimates
4.6. On asymptotic Results
5. Some measures of location and scale – with regard to their robustness properties
5.1. Introduction
5.2. Measures of location
5.2.1. A Definition
5.2.2. The Arithmetic Mean
5.2.3. The Median
5.2.4. Trimmed mean(s)
5.2.5. Other measures of location
5.3. Measures of scale
5.3.1. A Definition
5.3.2. The Standard deviation
5.3.3. The Median Absolute Deviation (MAD)
5.3.4. The t-Quantile Range
5.3.5. Other scale estimates
5.4. Higher Dimensions
6. Robust Regression Techniques
6.1. An Introduction and Definition
6.2. M-Estimates
6.3. The Repeated Median
6.4. The Least Median of Squares Regression
6.5. The Least Trimmed Squares Regression
6.6. The Coakley – Hettmansperger Estimator
6.7. Reweighted Least Squares
6.8. The Multivariate Reweighted Least Squares Approach
6.8.1. The Hat Matrix
6.8.2. The Minimum Volume Ellipsoid Estimator
6.9. Other Regression Methods and Limitations
6.10. Conclusions on Robust Regression
7. Application to SAS and Simulation
7.1. Introduction to Robustness Application and Simulation purposes
7.2. The initial data set – The zero contamination case
7.3. Seemingly negligible contamination in X-direction
7.4. Seemingly negligible contamination in Y-direction
7.5. High Leverage contamination
7.6. Large overall contamination
8. Conclusions
This thesis explores the vulnerability of the Ordinary Least Squares (OLS) regression method to outliers and evaluates alternative robust regression techniques. The primary objective is to technically examine how outliers affect OLS results and to introduce, discuss, and simulate more robust statistical estimators that provide reliable estimates in the presence of contaminated data.
3.2. Outlier in Regression Analysis and their influence on OLS results
A point, (x1,...,xp,yi) which deviates from the (in our case: linear) relation described by the majority of the data is called a Regression Outlier. The term linear relation describes the relation between the dependent and independent variables. This deviation from the actual trend can be the result of deviations either in the dependent or in the independent variables space.
If a x-value xi is located far away from the bulk of the other x-values, the observation (xi,yi) is called a leverage point or “outlier in the x-direction” respectively. This definition does not take into account the y-value of the certain observation. A leverage point is not necessarily something negative; a leverage observation can be quite beneficial. Therefore, we have to distinguish between good and bad leverage points. Whereas if a leverage point is a regression outlier, as defined above, it is called a bad leverage point. A bad leverage point has huge influence on the Least Squares estimates. If a leverage point does fit in the linear relation described by the other observations, it is very beneficial for the analysis. Such an observation is called a good leverage point. To present the benefit of good leverage points it is necessary to recall the covariance matrix of the Least Squares estimator Var(β|X) = σ²(X'X)⁻¹: a higher dispersion among the independent variables reduces the variance of the LS estimator. This becomes more apparent if we reduce the regression model to a simple one.
1. Introduction: Presents the motivation for robust statistics, highlighting the vulnerability of OLS regression to outliers and outlining the scope of the thesis.
2. The Classical Linear Ordinary Least Squares Regression: Recalls the OLS method, its necessary assumptions, and its key properties like the Best Linear Unbiased Estimator (BLUE) status.
3. Outliers and OLS: Defines outliers and investigates their impact on regression results, specifically distinguishing between leverage points and vertical outliers.
4. The concept of Robustness: Introduces the three fundamental pillars of robustness theory: qualitative, infinitesimal, and quantitative robustness, which serve as evaluation tools for estimators.
5. Some measures of location and scale – with regard to their robustness properties: Analyzes traditional and robust estimators for center and dispersion, serving as a basis for understanding robust regression.
6. Robust Regression Techniques: Discusses detailed robust alternatives to OLS, covering M-Estimates, Repeated Median, Least Median of Squares, Least Trimmed Squares, and Reweighted Least Squares.
7. Application to SAS and Simulation: Provides a simulation-based comparison of OLS, M-estimation, and high-breakdown techniques under various contamination scenarios.
8. Conclusions: Synthesizes the results and recommends the Reweighted Least Squares method as a robust and efficient approach for practical regression analysis.
Regression Analysis, Ordinary Least Squares, Robustness, Outliers, Leverage Points, Breakdown Point, Influence Function, M-Estimates, Least Median of Squares, Least Trimmed Squares, Reweighted Least Squares, Asymptotic Efficiency, Simulation, SAS, Statistical Inference.
The work primarily addresses the sensitivity of standard regression models to outlier contamination and identifies robust alternative methods that remain reliable when data violates classical assumptions.
The thesis explores the theoretical concepts of robustness, analyzes various estimators for location, scale, and regression, and performs extensive simulation studies to compare their performance.
The main goal is to evaluate techniques that maintain high statistical performance in the presence of outliers and to provide actionable recommendations for robust regression analysis.
The thesis covers OLS, M-Estimates, the Repeated Median, Least Median of Squares (LMS), Least Trimmed Squares (LTS), and Reweighted Least Squares (RLS).
The main section evaluates specific robust regression estimators through their robustness properties, computational feasibility, and efficiency, culminating in simulation-based benchmarking.
Key terms include Robustness, Outliers, Breakdown Point, Regression Analysis, Least Trimmed Squares, and Statistical Simulation.
A leverage point is a data point with an x-value far from the bulk of data. A leverage point is "good" if it fits the linear trend of the majority of data, thereby reducing estimator variance; it is "bad" if it deviates from that trend and exerts undue influence on the regression line.
The masking effect occurs when outliers prevent the detection of other outliers, rendering standard diagnostics like the Hat Matrix unreliable. This is why the thesis advocates for initial robust estimation to correctly identify these influential points.
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!

