Website Worth

Total Pageviews

Thursday

Understanding, checking collinearity

 

When you build a model to predict something, you would like to choose the ‘right’ predictor. There should be no redundancies among the set of predictors. Or, each predictor should explain some part of the variance of the dependence value that others don’t. In other words, predictions  should not overlap.

If you happen to use two predictors that correlate with each other significantly, they together can only explain a small amount of variance. Collinearity is a problem here. In stata a collinearity diagnosis can be obtained by typing vif  after regression


    Variable |       VIF       1/VIF 
-------------+----------------------
          d3 |      1.27    0.784681
          d2 |      1.25    0.800108
          d4 |      1.09    0.916960
-------------+----------------------
    Mean VIF |     1.20

If any individual score is larger than 10, that is the indication of collinearity. (None in this example)

If the mean of VIF is substantially larger than 1 –collinearity may exist. (Not very far from 1 in this example)

What to do?

Delete the variable with greater ViF. 

Using principle component analysis in Stata (PCA) to combine two or more variable to make an composite variable.