Robust Methods in Regression Analysis - Theory and Application - ii - Abstract
Regression Analysis is an important statistical tool for many applications. The most frequently used approach to Regression Analysis is the method of Ordinary Least Squares. But this method is vulnerable to outliers; even a single outlier can spoil the estimation completely. How can this vulnerability be described by theoretical concepts and are there alternatives? This thesis gives an overview over concepts and alternative approaches. The three fundamental approaches to Robustness (qualitative-, infinitesimal- and quantitative Robustness) are introduced in this thesis and are applied to different estimators. The estimators under study are measures of location, scale and regression. The Robustness approaches are important for the theoretical judgement of certain estimators but as well for the development of alternatives to classical estimators. This thesis focuses on the (Robustness-) performance of estimators if outliers occur within the data set. Measures of location and scale provide necessary steppingstones into the topic of Regression Analysis. In particular the median and trimming approaches are found to produce very robust results. These results are used in Regression Analysis to find alternatives to the method of Ordinary Least Squares. Its vulnerability can be overcome by applying the methods of Least Median of Squares or Least Trimmed Squares. Different outlier diagnostic tools are introduced to improve the poor efficiency of these Regression Techniques. Furthermore, this thesis delivers a simulation of some Regression Techniques on different situations in Regression Analysis. This simulation focuses in particular on changes in regression estimates if outliers occur in the data.
Theoretically derived results as well as the results of the simulation lead to the recommendation of the method of Reweighted Least Squares. Applying this method frequently on problems of Regression Analysis provides outlier resistant and efficient estimates.
Robust Methods in Regression Analysis - Theory and Application - iii
Contents
List of Figures v
List of Abbreviations v
1. Introduction 1
2. The Classical Linear Ordinary Least Squares Regression 3
2.1. Introduction to OLS
3
2.2. Properties of the Least Squares Estimates
5
2.3. Problems of OLS
5
3. Outliers and OLS 7
3.1. Outlier definition and common error sources
7
3.2. Outlier in Regression Analysis and their influence on OLS results 8
4. The concept of Robustness 13
4.1. Introduction to Robustness
13
4.2. Qualitative Robustness
13
4.3. Infinitesimal Robustness
15
4.4. Quantitative Robustness
19
4.5. Robust Estimates
21
4.6. On asymptotic Results
23
5. Some measures of location and scale
with regard to their robustness properties 25
5.1. Introduction
25
5.2. Measures of location
25
5.2.1. A Definition
25
5.2.2. The Arithmetic Mean
27
5.2.3. The Median
30
5.2.4. Trimmed mean(s)
32
5.2.5. Other measures of location
35
Robust Methods in Regression Analysis - Theory and Application - iv -5.3. Measures of scale 37
5.3.1. A Definition 37
5.3.2. The Standard deviation 39
5.3.3. The Median Absolute Deviation (MAD) 40
5.3.4. The t-Quantile Range 42
5.3.5. Other scale estimates 43
5.4. Higher Dimensions 44
6. Robust Regression Techniques 47
6.1. An Introduction and Definition 47
6.2. M-Estimates 50
6.3. The Repeated Median 52
6.4. The Least Median of Squares Regression 53
6.5. The Least Trimmed Squares Regression 58
6.6. The Coakley - Hettmansperger Estimator 61
6.7. Reweighted Least Squares 62
6.8. The Multivariate Reweighted Least Squares Approach 66
6.8.2. The Minimum Volume Ellipsoid Estimator
6.9. Other Regression Methods and Limitations 71
6.10. Conclusions on Robust Regression 71
7. Application to SAS and Simulation 74
7.1. Introduction to Robustness Application and Simulation purposes 74
7.2. The initial data set - The zero contamination case 75
7.3. Seemingly negligible contamination in X-directio n 76
7.4. Seemingly negligible contamination in Y-direction 77
7.5. High Leverage contamination 78
7.6. Large overall contamination 78
8. Conclusions 80
Appendices / References 81
Robust Methods in Regression Analysis - Theory and Application - v - Listof Figures
1 Simple Regression with 10 observations 9
2 Simple Regression with a bad Leverage Point 10
3 Simple Regression with a vertical outlier 11
List of Abbreviations
Asymptotic Relative Efficiency ARE
Breakdown Point BP
finite sample breakdown point fsbp
Generalized M-Estimates GM-Estimates
Median Absolute Deviation MAD
Minimum Covariance Determinant MCD
Generalized Maximum Likelihood Estimates M-Estimates
Maximum Likelihood M.-L.
Minimum Volume Ellipsoid MVE
Ordinary Least Squares OLS
Reweighted Least Squares RLS
Statistical Analysis Software (SAS Institute) SAS
Standardized Asymptotic Variance SAV
Sensitivity Curve SC
Weighted Least Squares WLS
Robust Methods in Regression Analysis - Theory and Application - 1 - 1.Introduction
Regression Analysis is an important tool for every quantitative research. It explores the relationship between dependent and explanatory variables. Many hypothesis claimed by economic theories can be tested by applying a Regression Model on real world data. The method of Ordinary Least Squares (OLS) is the most frequently applied Regression Technique. The application of this specific method requires several assumptions. Every researcher is aware of the fact that the OLS method performs poorly if these assumptions are not fulfilled. In the last two centuries, various strategies were introduced to test whether the model assumptions are fulfilled or not. Besides that, various more general Regression Techniques are available which are based on less stringent conditions. Up to the middle of the twentieth century violations of the model assumptions were treated independently from any common error source. But in particular outlying observations within the data can cause violations of model assumptions and thereby can have a huge impact on Regression results. The intention of this thesis is to examine technically the effects of outliers on OLS Regression and to present alternative Regression Techniques. Furthermore this thesis should be a less mathematical demanding introduction into the field of Robust Statistics as it usually provided by mathematical statistics. The practical use of the here considered methods is always the crucial point within this work.
Chapter 2 is recalling the definition of the OLS method and its required assumptions. The definition of Outliers and a first non technical examination of their influence on OLS Regression are presented in chapter 3. The subsequent chapter introduces three approaches to the theory of Robustness: qualitative, infinitesimal and quantitative robustness. These concepts play an important role for the assessment but as well for the development of Robust Regression Techniques. They enable us to point out technically the poor performance of OLS Regression in the presence of outliers.
Chapters 5 through 7 deal with the application of the three Robustness concepts on estimators of location, scale and finally regression. As measures of location and scale can be seen as first steppingstones into Robust Estimation and Testing, in chapter 5 several of these estimates are presented in order to pave the way for Robust Regression Techniques. Furthermore, these sections provide valuable information for robust univariate and multivariate data analysis in general.
In chapter 6, six robust alternatives to OLS Regression are presented in detail, judged with regard to their Robustness properties as well as with regard to their efficiency properties. In
Robust Methods in Regression Analysis - Theory and Application - 2 -particular “on the top” improvements on high breakdown objective Regression estimators are studied in detail as these methods are assumed to be the best performing Regression Techniques available. The considered high breakdown objective Regression estimators are the methods of Least Median of Squares or Least Trimmed Squares. The in chapter 6 introduced Regression Techniques are applied to some simulated data sets in chapter 7 by using the SAS software. Within this chapter are various types of outlier contamination simulated. The comparison of the Regression Techniques focuses on the proneness to outliers, the efficiency of the coefficient estimates and on the computational demand of these methods. Furthermore is in chapter 7 the availability of the introduced Regression Techniques in standard statistical software examined.
Taking into consideration in particular the in chapter 6 and 7 obtained results; a recommendation for particular Regression Techniques will be given in the concluding chapter. We found the method of Reweighted Least Squares (RLS) to be the most recommendable Regression Technique within the scope of the here considered. It combines good Robustness as well as efficiency properties and is besides that available in some of the most frequently used statistical software packages such as SAS, S-PLUS etc.
Robust Methods in Regression Analysis - Theory and Application - 3 - 2.The Classical Linear Ordinary Least Squares Regression
2.1. Introduction to OLS
A regression model is used to explore the relationship between a dependent (or explained)
variable and one or more independent (or explanatory) variables. In a classical multiple linear
regression model 1 we do assume each observation of the dependent variable y to be generated
β β β ε = + + + + y x x by the following process: ; where the x i ’s are the independent
0 1 1 i i i p p i
variables, the ß i ’s are the regression coefficients which have to be estimated and ε i is a
disturbance term.
To present the following properties of Regression Analysis it is worth to give a notation:
y
1
; y (2) is a column vector of n observations of the dependent variable y. = y
y
n
1
× + matrix containing p independent variables x i with n observations each and a 1 n p X is a
first column of 1s. This first column ensures a constant term in the model.
β
0 β 1 ; β is a column vector of the regression coefficients. = β β p
ε
1
; ε is a column vector of the disturbances. = ε ε
n
The classical multiple regression model can now be formulated in the following way:
1 For a definition see GREENE (2003) p. 10.
2 Boldface letters indicate vectors or matrices.
Robust Methods in Regression Analysis - Theory and Application - 4 -As the relation of y and X is unknown, it has to be estimated. We will refer to the following
notation:
ˆ β , β is the estimate for the unknown regression coefficient i
i
ˆ y x β = ˆ i ˆ ' y is the estimate for i y :
i i
ˆ ε : = − = − i ˆ and i e the estimate for i x 'β ; i e is called the residual. e y y y
i i i i
The crucial point is the estimation of the parameters. Although various approaches have been
suggested, the most frequently used is the Least Squares fitting criterion 3 . The goal of any
fitting criterion is to find a vector ˆ β which brings the fitted line ˆ Xβ close to the real
observations y. The Least Squares method is popular due its simple computation by Matrix
Algebra and what Hampel et al. 4 call “its mathematical beauty”. Furthermore, there are other
(good) reasons for choosing this particular method, we will point out later on.
∑ (5) the coefficient vector ˆ 2 β can be e By minimizing the sum of the squared residuals
i
= 1 i
( )
Min e'e , recalling the definition of the obtained. This condition is in matrix notation:
= − − = − + residual this can be written as: ( ) '( ) 2 e'e y βX y βX y'y y'Xβ β'X'Xβ .
This sum is minimized with respect to the unknown coefficient vector ß:
∂ ( ) e'e ˆ ( ) − = − + = ⇒ = 1 β X'X X'y . 2 2( ) 0 X'y X'X β
∂ β
This solution requires the assumption of a full rank of the X matrix, so that the X'X matrix is
invertible 6 . A useful tool for Outlier detection is the Hat Matrix, = -1 ' H X(X'X) X . It is called
hat matrix, because it transforms the vector of observed dependent variables into the vector of
the predicted dependent variables, ˆ = y Hy .
3 GREENE (2003) p.20.
4 HAMPEL ET AL. (1986) p.307.
5 Minimizing the sum of the squared residuals can be expressed equivalently as minimizing the mean of the
squared residuals, this facilitate for instance to understand the MLS idea we will discuss in a following chapter.
6 The proof that the obtained solution is a minimum is omitted here, see e.g. GEENE (2003) p.21.
Robust Methods in Regression Analysis - Theory and Application - 5 - 2.2.Properties of the Least Squares Estimates
The estimated coefficient vector ˆ β is often called BLUE, the best linear unbiased estimator.
Linearity is ensured if the model specifies a linear relationship between the dependent and
independent variables. The term “unbiased” refers to the property of the estimator, that its
ˆ β β. ˆ β is the best estimator 7 , because it is E = natural expectation equals the real value, ( )
OLS
the minimum variance estimator, which is proven by the Gauss-Markov Theorem 8 . The
ˆ σ − = 2 1 covariance matrix of the Least Squares estimator is given by ( ) ( ) Var X'X , where β X
σ is ( 2 E εε' X . As the disturbances are unknown, their variance is unknown too and has to )
be estimated as well. The disturbances are estimated by the residuals, thus their variance is
σ = = ee' X . In chapter 5 we will introduce other estimated by the residual variance 2 2 ˆ s E ( )
measures of dispersion and their properties, especially with regard to Robustness. These
properties of the estimator make the OLS method that powerful, particularly if it is taken into
consideration that these properties do not need any distributional assumption 9 .
The assumption of a multivariate normal distribution of the disturbances is required only for
regression inference. However, in order to obtain exact statistical results 10 , i.e. answer the
posed questions, inference is necessary.
2.3. Problems of OLS
While applying OLS the user has to be aware of some sources of errors, which can affect the
results sustainable. Some of them already dilute the results without taking into account the
distributional assumption of the disturbances (residuals respectively). Among these the most
common ones are Mutlicollinearity and model misspecification. Multicollinaerity (i.e. high
correlations between two or more independent variables) can affect the coefficient estimates,
as the influence of a single regressor 11 might be superimposed by others which are correlated
with it. It has to be mentioned that the case of an exact linear relationship (i.e. perfect linear
correlation) is excluded, as this yields a noninvertible X'X matrix.
7 The “best estimator“property is also labelled with the term “most efficient” estimator.
8 GREENE (2003) p.47.
9 GREENE (2003) p. 50.
10 GREENE (2003) p.17.
11 Independent variables in a regression model are also called regressors and the dependent one regressand.
Robust Methods in Regression Analysis - Theory and Application - 6 -Model misspecification can occur by omitting a variable or by the inclusion of an irrelevant variable. Also a wrong assumption on the general model structure can be the source of misspecification; e.g. is the linear fitting of data which actually stands in a nonlinear relationship.
As already mentioned it is necessary to make assumptions on the disturbance distribution to ensure exact statistical results, i.e. to test hypothesis in the model and to give confidence intervals.
The disturbance terms should be distributed independently and identically (i.i.d.), this distribution should be a normal one. This independent distribution requires independence among the disturbances (nonautocorrelation) and independence from the regressor variables. The disturbances are distributed identically as they follow a normal distribution with a E = ) and a common variance (homoscedasticity). Violations of these common mean ( ( ) 0 ε
assumptions can cause deviations in the underlying distribution, e.g. heteroscedasticity among the error terms becomes visible as “fat tails” in the underlying distribution 12 . The crucial point is that OLS often performs poorly if the assumptions are violated. ˆ σ − = 2 1 Considering the definition of the coefficient estimate variance ( ( ) ( ) Var X'X ), it is β X
reasonable that fat tails in the disturbance distribution inflate this variance 13 , and by that lower the statistical significance of any statement given on this estimate. As even slight deviations in the distribution, which might not be observable applying standard methods 14 (e.g. the Kolmogoroff - Smirnov - test 15 ) can yield large increases in the estimator’s variance 16 , this fact requires special attention.
One of the purposes of this work is it to point out the poor performance of OLS in the presence of Outliers as a crucial error source in Regression Analysis 17 . Therefore, it is worth to mention that many of the key error sources listed above can result from the occurrence of outlying observations 18 . These undesirable situations can be achieved by changes in single observations among the sample. But even Outlier-free data sets can imply potential problems. Outlier occurrence is only one possible reason for deviations from the model assumptions. In the next chapter the influence of outlying observations on main OLS results and their inference will be discussed in detail.
12 TUKEY (1960) p. 458. σ of the disturbances. 2 13 As fat tails inflate the variance
14 E.g. WILCOX (1997) p.4.
15 HARTUNG ET AL. (2002) p.183.
16 Cf. WILCOX (1997) p.210.
17 Usually are outlier problems identical with distributional problems and vice versa. Cf. Huber (1996) p.3: “…“distributional robustness” and “outlier resistant” are interchangeable”
18 Cf. ROUSSEEUW ET AL. (1987) p.102.
Robust Methods in Regression Analysis - Theory and Application - 7 - 3.Outliers and OLS
3.1. Outlier definition and common error sources
First, it is necessary to specify and to define some crucial terms, especially to define the term outlier: “…data which are far away from the bulk of the data, or more generally, from the pattern set by the majority of the data.” 19
Sources for outlying observations are among other things copying and transmission errors 20 , typing errors and model failures 21 . It even occurs in officially published time series that the variable is measured in different units, e.g. if another recording system is implemented 22 . Model failures include the case of samples drawn from different populations 23 . Hampel et al. 24 give in chapter 1.2c some examples of real data sets and the degree of outliers belonging to them; they conclude a degree of outliers in a data set of 1% up to 10% being “the rule”. Particularly longer tailed error distributions are likely to occur 25 . If a sample contains observations from more than one population the term contaminated distribution is used frequently. In a one dimensional case these two different populations might be characterized by different measures of location and (or) by different measures of dispersion 26 . Tukey (1960) discussed in his pioneering work the effect of contamination on the real distribution. It is necessary to keep in mind that usually not the entire contamination is observable among the sample. Tukey divides the contamination fraction among the sample in errors and blunders. Blunders can be easily uncovered by applying some outlier detection tool, but these tools might fail to observe the errors 27 among the sample. A simple deletion of the blunders would cover up 28 the real problem and thereby lead to fundamental errors in the obtained results and their interpretation. This gives an introductory insight how carefully outlying observations (if they are uncovered) have to be handled. The entire range of reasons for the occurrence of
19 HAMPEL ET AL. (1986) p.25.
20 ROUSSEEUW ET AL. (1987) p.3.
21 HAMPEL ET AL. (1986) p.25. And particularly these model failures are likely to occur, cf. ROUSSEEUW ET AL. (1990a) p.637.
22 E.g. ROUSSEEUW ET AL. (1987) Table 2 (p. 26): Number of International Calls from Belgium.
23 An e.g. is given in ROUSSEEUW ET AL. (1987) p.27.
24 HAMPEL ET AL. (1986). Confirm also Huber (1996) p. 2.
25 Cf. HAMPEL (2001) p.2.
26 Skewness and Kurtosis can play an important role as well, but are not considered here.
27 Hampel uses the term “hidden contamination” cf. HAMPEL (1974).
28 As “all distributions are normal in the middle” (Tukey (1960) p.457), tail cleaning leads to similar distributions.
Robust Methods in Regression Analysis - Theory and Application - 8 -outlying observations has to be considered while discussing Outliers and in particular outlier deletion in chapter 6: “Robust Regression Techniques”. As Regression Analysis contains at least two dimensions, the case becomes more complicated 29 . Outliers can occur in only one dimension or in more dimensions simultaneously. We give in the subsequent section a definition of outliers in Regression Analysis and reveal their potential impact on Regression results.
3.2. Outlier in Regression Analysis and their influence on OLS results
A point, 1 ( ,..., , ) x x y which deviates from the (in our case: linear) relation described by the
i i p i
majority of the data is called a Regression Outlier 30 . The term linear relation describes the relation between the dependent and independent variables. This deviation from the actual trend can be the result of deviations either in the dependent 31 or in the independent variables space.
If a x-value x i is located far away from the bulk of the other x-values, the observation (x i ,y i ) is called a leverage point or “outlier in the x-direction” respectively 32 . This definition does not take into account the y-value of the certain observation. A leverage point is not necessarily something negative; a leverage observation can be quite beneficial. Therefore, we have to distinguish between good and bad leverage points. Whereas if a leverage point is a regression outlier, as defined above, it is called a bad leverage point. A bad leverage point has huge influence on the Least Squares estimates. If a leverage point does fit in the linear relation described by the other observations, it is very beneficial for the analysis. Such an observation is called a good leverage point. To present the benefit of good leverage points it is necessary ˆ σ − = 2 1 ( ) ( ) Var to recall the covariance matrix of the Least Squares estimator X'X : a higher β X
dispersion among the independent variables reduces the variance of the LS estimator. This becomes more apparent if we reduce the regression model to a simple one. The variance of
σ
2
the least squares estimate of 1
terms variance has to be estimated, the residuals variance (s 2 ) has to be applied than. Heavy
29 Cp. Huber (1973): “…moreover, outliers are much harder to spot in the regression than in the simple location case.”
30 Another term used for such a kind of observation is “influential observation”.
31 Provided that the independent variables are observational and not designed as fixed. Cf. ROUSSEEUW ET AL. (1990a) p.634.
32 ROUSSEEUW ET AL. (1987) p.6.
33 GREENE (2003) p.46.
Robust Methods in Regression Analysis - Theory and Application - 9 -tails (i.e. a higher kurtosis) 34 in the distribution of the independent variables inflate the
denominator of the variance term, followed by a decreasing estimate’s variance. The results
become therefore “more precise” 35 . One example for more precise results are closer
confidence intervals for the real values of the regression coefficients; such a confidence
ˆ ˆ β σ β β σ α − ⋅ ≤ ≤ + ⋅ = − , where ˆ ˆ interval is defined as: t α is the Prob( ) 1 t t
α α ˆ ˆ i i i / 2 / 2 / 2 β β
i i
appropriate critical value of the students distribution with (n-p) degrees of freedom and
(1 α − ) the level of confidence as requested. The range of the coefficient’s confidence interval
decreases as the estimator’s variance decreases. Another important tool of regression
inference is the frequently applied t-test. It tests if a coefficient is statistically different from
zero on a certain level of confidence. To test the hypothesis, the test statistic
t α , the hypothesis that the coefficient is equal zero applied. If this test statistic is greater than / 2
can be rejected. While the level of confidence and the coefficient’s estimate ˆ β remain
i
constant, the test is more likely to reject the insignificance hypothesis if the variance of the
estimator decreases.
Considering the case that the leverage point is a regression outlier, i.e. a bad leverage point,
the beneficial effect of being a leverage observation is diminished by the bad influence on the
least squares estimates.
34 E.g. caused by leverage observations.
35 GREENE (2003) p.46.
Robust Methods in Regression Analysis - Theory and Application - 10 -For presentation purposes the examples are composed by simple regression cases, the plot of the (x i ,y i ) points is called scatter plot.
Regarding the simple regression in Figure 1, a change of the x-value of observation one (3 instead of -0.7) transforms observation one into a bad Leverage Point. The new residual (if measured from the original regression line) is very large.
As the least squares method minimizes the average value of the squared residuals, this large residual is taken into consideration and has a strong influence on this average 36 . The new regression line is strongly influenced by this residual and the new line of best fit is (very) different from the original one 37 . The new regression line tilts the large influence of the new residual and changes by that its original shape, as we can see in Figure 2. The slope of the regression line changed from 1.91 in the initial case to 0.14 in the bad Leverage Point case.
But not only leverage points can be regression outliers. If the x-value of an observation remains in the bulk of the other x-values but the y-value deviates in such a way, that the observation (x i ,y i ) is a regression outlier the according observation is called a “vertical” outlier or “outlier in the y direction” 38 . These outlying observations are influential on the least squares results as well, but with a less potential impact. The according new residual (again measured from the original regression line) is normally smaller than it was the case for bad
36 That the arithmetic mean is susceptible to a single outlier will be discussed more detailed while presenting breakdown properties of location estimates.
37 The proneness to a single outlier is obviously caused by the arithmetic mean properties in this simple regression example.
38 ROUSSEEUW ET AL. (1987) p.3.
Robust Methods in Regression Analysis - Theory and Application - 11 -leverage points. Regarding again the simple regression example in Figure 1, a change of the y-value of observation one (9 instead of 3.5) transforms observation one into a vertical outlier. The influence on the slope of the regression line is smaller than in it was the case for the Leverage Point (it changed from 1.91 in the initial case to 0.7 in the case containing the vertical outlier).
Bad leverage points and vertical outlier form the group of regression outlier. Each observation with huge influence on regression results can be classified in one of these groups. However, there exists an interesting difference between the two types of outliers. The probability of remain covered among the other observations is higher for a leverage point than for a vertical outlier. Even standard techniques which apply the non robust least squares regression 39 are often able to detect vertical outlier 40 . The ability to detect leverage points is a crucial criterion for the assessment of outlier detection tools. Furthermore, the capability of resisting against leverage points is an important point while looking for appropriate robust regression methods.
We experienced an increasing difficulty of outlier identification for increasing dimensions. While applying a multiple regression case, it is going to be more complicated and new problems occur besides the one we already discussed 41 . Although outlier detection will be discussed in detail in chapter 6, it is obvious, that both vertical outlier and bad leverage points
39 E.g. is the residual plot of the least squares residuals.
40 ROUSSEEUW ET AL. (1987) p.3.
41 All definitions for outliers and problems caused by them which were made above are sufficient as well for the multivariate case.
Robust Methods in Regression Analysis - Theory and Application - 12 -can be easily detected by looking at the scatter plot. In the case of a limited number of observations and a two dimensional problem like in the considered examples there is no need for mathematical advises for outlier detection. But as the problem’s dimension exceeds two 42 , no scatter plot is available.
Standard (classical) methods for the identification of multivariate outliers such as the hat matrix or the Mahalanobis distance 43 suffer from the so called “masking effect” 44 . The outliers mask themselves behind non robust measures of location and measures of covariance. Up to here we introduced outliers in general and their influence on Ordinary Least Squares Regression Analysis was given. In order to discus problems in a more technical way, it is necessary to define and specify the term Robustness. This will enable us to have a mathematically appropriate look on outliers and problems of the least squares method with their occurrence.
42 I.e. the problem becomes multivariate.
43 See section 6.8.1 for details.
44 This is no surprise as the two approaches are related with each other. Cf. ROUSSEEUW ET AL. (1990a) p.635.
Robust Methods in Regression Analysis - Theory and Application - 13 - 4.The concept of Robustness
4.1. Introduction to Robustness
Before decomposing the robustness concept in different approaches, it is worth to give a non technical definition for the term “Robustness”. Huber 45 provides such a definition: “Robustness: signifies insensitivity against small deviations from the assumptions”. However, this can be brought in a statistical surrounding if we consider that statistical analysis often depends on certain assumptions. Particularly in the case of regression analysis which is stressed here, we already stated the assumptions “normality”, “independence” and “homoscedasticity” among others. The fact of the frequent occurrence of outliers, or gross error sources respectively, transforms most of the assumptions into approximations 46 . “Robust” refers now to the ability of being resistant against these (natural) deviations from the underlying assumptions. Small changes, e.g. in a distribution, can be the result of two different situations, either a small change in many observations (because of rounding, grouping etc.), or gross changes in only a few 47 observations (due to typing, transmission, copying etc. errors) 48 .
Hampel developed 49 1968 in his unpublished 50 PhD Thesis 51 the concepts of robustness which are the common approaches nowadays, namely the concepts of qualitative, quantitative and infinitesimal robustness.
4.2. Qualitative robustness
Roughly spoken is Qualitative Robustness ensured, if a small change in the distribution of F (the underlying model) results only in a small change in the distribution of the estimate T n . F is supposed to be the real underlying distribution and according to this F n is the empirical distribution function drawn from a finite sample of i.i.d. random variables 1 ,..., n X X . T n is an = ( ,..., ) T T X X . Estimates can be derived from estimate based on these random variables,
1 n n n
maximum likelihood methods, linear combinations of order statistics, rank tests or other
45 HUBER (1996) p.1.
46 Cp. HAMPEL ET AL. (1986) p.1.
47 Few in the sense, that only a small fraction of the data is affected.
48 HUBER (1964) p.6.
49 Or better refined and developed further. See HUBER (1972) for historical notes on the robustness discussion.
50 But according to Hampel et al. (1986) p.40 are the majority information published in Hampel (1971 and 1974)
51 F.R. Hampel: “Contributions to the Theory of Robust Estimation”, University of California, Berkeley
Robust Methods in Regression Analysis - Theory and Application - 14 - µν µ ν methods 52 . If the estimator is a linear functional 53 , i.e. ((1 ) 54 − + = − + ) (1 ) ( ) ( ) T s s s T sT
≤ ≤ , a simple definition of qualitative robustness can be given: The estimator has to for 0 1 s
be continuous 55 at F. As discussed above, the small change in the distribution F n can be either
the result of small changes in many or all observations or of gross errors in a few
observations. Accordingly, continuity is a necessary and sufficient condition to ensure that
none of these slight changes in the empirical distribution leads to gross changes in the
estimator. A simple but somehow smart example is given in Wilcox (1997) 56 :
= ≤ = > . If we now consider a situation where x=1, than ( ) 0 1, ( ) 10000 1 f x if x but f x if x
an infinitesimal increase in x would result in a large increase in ( ) f x . This function ( ) f x is
not a continuous one. Before giving a more appropriate definition of the “continuity”
restriction, the term “slight changes” in a distribution has to be formalized first. Differences
between distributions (here the underlying true and the achieved empirical one based on a
sample) can be measured by the use of metrics. Most established and well known is the
Kolmogorov distance but more frequently applied in the theoretical work of Hampel, Huber
and others is the Prohorov metric. We will define and use the Kolmogorov distance, due to its
relatively simple character. Let F and G be any two distributions, the Kolmogorov distance is
57 = − defined as: ( , ) sup ( ) , taken over all possible values of x. So it is the D F G F x G x ( )
maximum difference between the two distributions, or the least upper bound if the maximum
does not exist respectively. This distance is in maximum equal to one; and zero if the two
distributions are identical.
→ .{ } − → T is defined to be continuous at F when ( ) G is any T G T F if D G F ( ) 0 ( , ) 0
n n n
sequence of distribution functions (n=1,2,….). This is from particular interest under
knowledge of the Kolmogorov theorem: [ ] → →∞ = . If we now replace n ( , ) 0 1 P D F F n G
n
F , a continuous 58 estimator at F should satisfy ( ) lim ( ) 59 = by n T F T F . This continuous
n →∞ n
condition using the Kolmogorov distance (which defines a strong topology 60 ) is enough to
ensure the shown consistence of the estimator, but not enough to guarantee qualitative
52 HUBER (1996) p.6.
53 The terms “estimate” and “functional” are used with an identical intention. But in an exact manner, an estimate
= is a functional if ( ) for all n and G n . ( ) T G T G
n n n
54 HUBER (1996) p.6.
55 WILCOX (1997) p.13.
56 WILCOX (1997) p.13.
57 WILCOX (1997) p.14.
58 This property is also called consistence. HUBER (1981) p.8.
59 HUBER (1981) p.8.
60 STAUDTE ET AL. (1990) p.65.
Robust Methods in Regression Analysis - Theory and Application - 15 -robustness in general. Continuity with respect to a weaker 61 topology 62 is required; this can be achieved by applying the Prohorov metric.
This does not matter, as this short introduction aims not to present the complete mathematical concept, but aims to work out the basic idea of the qualitative robustness concept. The concept of qualitative robustness is, however, not the most popular one. Because of its mathematical demanding definition, it is not often discussed in the literature 63 . Qualitative robustness is a rather weak then a strong condition for robustness. Other concepts deal with similar requirements on estimates and might be more accurate in their definition. One of the reasons is that qualitative robustness can be proven only on a certain distribution, i.e. it is no global robustness concept (which works for every distribution). But the concept of qualitative robustness definitely fails to have a huge impact on the robustness discussion as it is not able to point out differences among the qualitative robust procedures 64 . It only provides a Yes or No decision for a certain estimate. Nevertheless it provides an idea of robustness implications, as well as several of its requirements are also included in other robustness concepts. Qualitative robustness should be considered first, before looking at infinitesimal and quantitative properties 65 .
Examples for qualitative robust estimates are given in the next chapter, where some estimates for location and scale are judged on their robustness with regard to the introduced concepts.
4.3. Infinitesimal Robustness
The infinitesimal robustness concept deals with the “Influence Curve” and the “Influence Function” which are highly related terms. The infinitesimal robustness approach connects the concepts of qualitative and quantitative robustness. It catches up the idea of continuous estimates and paves the way to a global robustness judgement. The term Influence Curve refers to the early work of Hampel (1968, 1974) and refers to the possibility of plotting this curve as well as to its geometric aspects for an interpretation 66 . As the problem was generalized to higher dimensions (and therefore no longer plotable), the more frequently used term is Influence Function.
61 i.e. a smaller neighbourhood space around F
62 STAUDTE ET AL. (1990) p.65.
63 STAUDTE ET AL. (1990) p.66.
64 HAMPEL ET AL. (1986) p.41.
65 HAMPEL (1974) p.389.
66 Cf. HAMPEL ET AL. (1986) p.84, p.41.
Robust Methods in Regression Analysis - Theory and Application - 16 -Once again is the estimator T desired to be resistant against small changes in the underlying empirical distribution. But the idea of this particular robustness concept is somehow different: The change in the empirical distribution is now the result of adding an additional observation x is to the sample 67 . The influence function measures the (normed) influence on an estimate if an observation is added. But to ensure a feasible formalization, it is necessary to recall the term of contaminated distributions. An added observation is nothing else than a (small) fraction of the new distribution. Namely consider F to be the underlying true distribution and the distribution in which the added value x occurs with probability one. The new obtained
x ε ε + , where ε is the fraction of the added observation in the = − distribution is , (1 ) F F ε x x
ε F ε , the distribution F occurs with probability (1 ) − new sample. In the new distribution , x
and the value x with probability ε respectively. The observation x, or the fraction of observations distributed such as x, are assumed to be “bad” 68 . The attribute “bad” refers to the fact that they are differently distributed than F, i.e. they are outliers. We can compute the relative influence of the value x which occurs with probability ε on the estimate T: − ( ) ( ) T F T F ε (69) , x . ε
If we take the limit from the respective relative influence, with
ε
approaching zero from
−
T F T F
( ) ( )
above, we obtain the Influence Function.
exists as a real number 71 . A verbal explanation of the Influence Function can be given as follows: IF(x) is the relative influence of an observation x, which occurs with a probability close to zero (an infinitesimal contamination), on a functional T describing an underlying distribution F.
The functional T(F) is said to be infinitesimal robust if its corresponding Influence Function is bounded. Somehow the Influence Function looks like the measurement of the slope of the functional. In the case of ordinary functions this is measured by the first derivative. Indeed, this is no coincidence, as any function f(x) is only robust against small changes in x if the < . This obviously includes the restriction that according first derivative is bounded, '( ) f x B
f(x) is differentiable. Remembering the continuity restriction of the Qualitative Robustness
67 This can be easily extended to a few observations. Cf. HAMPEL ET AL. (1986) p.186.
68 STAUDTE ET AL. (1990) p.58.
69 WILCOX (1997) p.16.
70 HAMPEL (1974) p.383.
71 STAUDTE ET AL. (1990) p.58.
Robust Methods in Regression Analysis - Theory and Application - 17 -concept this becomes an important fact. If a function contains jumps (i.e. is not qualitative robust) it is mostly not robust in the infinitesimal way as well. In this respect, the Influence Function can be considered as the first derivative of the functional under study 72 . Some estimates does not have an influence function (in particular if their distribution is not asymptotically normal), therefore we can approximate the IF by the so called stylized Sensitivity Curve SC n , a finite sample version of the IF: ( ) ( ) ( ) ⋅ 73 , with x ranging over an interval of all points x i = − ( ) ,..., , ,..., SC x T x x x T x x n + 1 1 1 n n n n n
in the sample. If T is asymptotically normally distributed, the SC approaches to the IF for n approaching to infinity.
As the qualitative robustness concept the infinitesimal robustness concept is no global robustness judgement tool, it can only be evaluated at any specific distribution 74 . It is recommended to evaluate infinitesimal robustness on the theoretical underlying distribution 75 , and for the most instances this will be the normal distribution.
The Influence Function can be regarded as a useful tool for studying the estimator’s local robustness properties. The definition of the locality of a problem is only a “wild guess” 76 : amounts up to 0.25 or 0.5 for the ratio between the degree of contamination and the breakdown point 77 are recommended. In this interval the Influence Function (or the Influence Curve respectively) is a good linear approximation of the real behaviour of the functional at F. Two key numbers can be derived from the Influence Function: first the asymptotic variance and second the gross-error-sensitivity. They summarize its most important features, i.e. they are summary values 78 . The first one is not important with regard to robustness, but will become one of the main judgement tools for the goodness of an estimator. The asymptotic variance of the estimator is the expected square of the Influence Function if it is computed on
2 (79) = ( , ) Var T F is the the theoretical underlying distribution. Var T F E IF x ( , ) ( ) , where , F T F
asymptotic variance of the estimator T on the distribution F. For the Sensitivity Curve the
72 Cf. WILCOX (1997) p.15.
73 ROUSSEEUW ET AL. (1987) p.189.
74 But it can occur that a specific Influence Function is independent of any underlying distribution, e.g. the IF of the arithmetic mean.
75 HAMPEL (1974) p.386.
76 HAMPEL (1974) p.388.
77 The breakdown point concept will be explained in the subsequent section.
78 HAMPEL (1974) p.388.
79 STAUDTE ET AL. (1990) p.67.
Robust Methods in Regression Analysis - Theory and Application - 18 -
V variance can be approximated in a similar way:
distributed, V n will approach the asymptotic variance for n approaching infinity. The other summary value of the Influence Function is the gross-error-sensitivity; it measures the worst impact (i.e. the maximum bias) of a certain amount of contamination on the γ = (81) , sup is again the symbol for supremum which is the estimator. * sup ( , , ) IC x F T
x
maximum of the function or respectively the lowest upper bound if the maximum does not exists. If an estimate does not have infinitesimal robustness, i.e. its Influence Function is unbounded, this value is not computable, it approaches infinity. But if estimators do have infinitesimal robustness, it is possible to compare them according to certain amounts of contamination on certain underlying distributions 82 .
These two values are in conflict to each other; this is fundamental for almost all the robustness discussion in any respect. Robustness and asymptotic variance contradict each other because lower upper bounds of the gross-error-sensitivity induce higher asymptotic variances 83 . If we recall that the estimator with the lowest variance is the most efficient one, we can formulate the statement as follows: Robustness and efficiency contradict each other. This problem will be caught up while discussing the optimal estimator in this section and furthermore while discussing Robust Regression Techniques. A similar value which is describing the Influence Function can be derived when adopting another strategy. Instead of “throwing in” an additional observation we replace some observations by others. This becomes relevant if we take rounding or grouping of the observations into consideration 84 . This procedure is called “wiggling” of the observations. If the gross-error-sensitivity measures the worst effect of throwing in contamination the newly introduced value measures the worst possible (again local) effect of wiggling the observations. λ is called the local-shift-sensitivity, where y λ − (85) ; * = − * sup ( , , ) ( , , ) / IC y T F IC x T F y x
≠ x y
is an added observation and x a removed one. The definition of the local-shift-sensitivity refers to a very limited change, due to the standardization. It is a less global summary value than the gross-error-sensitivity. Thus it is a not very appropriate measure of robustness. For instance it occurs that the arithmetic mean, which is not robust in any other sense, is the most
80 ROUSSEEUW ET AL. (1987) p.189.
81 HUBER (1981) p.14.
82 Obviously is this also possible by looking at the Influence Curves of some estimators, but this is not possible while dealing with higher dimensions.
83 HAMPEL (1974) p. 388.
84 HAMPEL (1974) p.389.
85 HAMPEL ET AL. (1986) p.88.
Robust Methods in Regression Analysis - Theory and Application - 19 -robust in terms of local-shift-sensitivity (due its locally smoothing over the data 86 ), while robust measures of location, e.g. the median, have infinite local-shift-sensitivity. This property of the median will become important while discussing “inlier” problems of the LMS procedure in section 6.4. There are other useful concepts related with the Influence Function, but we will only name them without going into details: rejection point, change-of-variance sensitivity, change-of-bias sensitivity 87 .
Besides the visualization in an Influence Curve, gross-error-sensitivity and asymptotic variance are the most important facts we can obtain from the discussion on infinitesimal robustness.
4.4. Quantitative robustness
The infinitesimal robustness approach deals with the approximate linear influence of a very small fraction of contamination. But as this fraction of contamination increases, a linear approximation might no longer be appropriate.
To obtain a judgement tool for robustness which is less restricted to a local neighbourhood it will be necessary to stress the quantitative robustness approach. It is based on the breakdown point idea. A rough definition is given by Hampel (2000) (88) : the breakdown point gives “the smallest amount of free 89 contamination which can carry the estimator over all bounds”, i.e. makes it breaking down. Beyond this border line the estimator is said to be totally unreliable 90 . Breakdown discussion is divided into an infinite and a finite sample objective. As usual we take an estimator T on a sample distribution F n (the finite sample contains n observations) as given. If we now replace m of the n original observations by arbitrary values, i.e. very bad outliers are possible, we obtain a sample with n-m good and m bad observations, we call this distribution F n ’. The crucial point is how the estimator reacts on this contamination. To study this more detailed we define the bias in the
91 − ' ( ) ( ) T F T F , where this definition contains the possibility to study estimators estimator: n n
in higher dimensions, e.g. in regression analysis. Main focus will be on the maximum possible bias of the estimator if a fixed amount of m observations is replaced by bad values. According
86 Cf. HAMPEL ET AL. (1986) p.22.
87 Cf. HAMPEL (2000) p.10.
88 and (of course) in nearly all other publications on robustness
89 “free” denotes generated by any distribution or even deterministic.
90 HAMPEL (1974) p. 388. ... indicates the Euclidian norm 91
Arbeit zitieren:
Robert Finger, 2006, Robust Methods in Regression Analysis – Theory and Application, München, GRIN Verlag GmbH
Dieser Text kann über folgende URL aufgerufen und zitiert werden:
Einbetten
DOI
Formatvorlage (Microsoft Word) für eine Diplomarbeit, Masterarbeit, Ha...
Für MS Word 2003 - Update 2010
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 25 Seiten
Formatvorlage (OpenOffice) für eine Diplomarbeit, Masterarbeit, Hausar...
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 35 Seiten
Formatvorlage / Vorlage zur Erstellung einer Diplomarbeit, Bachelorarb...
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 15 Seiten
Formatvorlage / Vorlage für eine Diplomarbeit / Hausarbeit
Für MS Word 2007 - dotx
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 25 Seiten
Anleitung zum Erstellen schriftlicher Arbeiten: Der Aufbau einer wisse...
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 20 Seiten
Erstellen einer schriftlichen Hausarbeit
Vorlagen, Muster, Formulare, Infobroschüren
Hausarbeit, 14 Seiten
Grundtechniken wissenschaftlichen Arbeitens
Bibliografieren - Reden - Schr...
Vorlagen, Muster, Formulare, Infobroschüren
Skript, 46 Seiten
Ratgeber zur Erstellung wissenschaftlicher Arbeiten. Diplomarbeiten - ...
Vorlagen, Muster, Formulare, Infobroschüren
Ausarbeitung, 39 Seiten
Robert Finger hat den Text Robust Methods in Regression Analysis – Theory and Application veröffentlicht
Robert Finger hat einen neuen Text hochgeladen
0 Kommentare