A Study on Identification of High Leverage Points in Multiple Linear Regression
Outliers with respect to the predictor variables are called high leverage points. The observations that are slightly different from all others can drive to a large difference in the results of regression analysis. In regression analysis, the detection of high leverage points is compulsory, as they will give large impact on the estimation values as well as lead to multicollinearity problems. In this situation, robust regression procedure can be very useful to deal with problems arise due to the existence of high leverage points. The aim of this study is to compare the performance of three methods in detecting high leverage points. At first stage, the two well-known data sets are considered. The first data used is artificial data set generated by Hawkins, Bradu and Kass in 1984 and the second data used is stack loss data by Brownlee in 1965. The second stage of this study is to conduct simulation study whereby the data were generated based on clean and contaminated data. The three sets of measures being considered in this study are Leverage methods Ttwice-the-mean-rule), Generalized Potentials and Diagnostic Robust Generalized Approach (DRGP). The result indicates that DRGP successfully proved its ability as a powerful method of detecting high leverage points as compared to the other two methods using both artificial data sets and simulated data.