A social science blog: Checking for autocorrelation /serial correlation

Friday

Checking for autocorrelation /serial correlation

What is autocorrelation?
1) So you have an equation to predict values of a dependent variable Y….
2) The predicted values (Y’i) are only ‘close’ the the real values (Yi)….
3) The difference between predicted and real values are called by different names: residual (the left over), or error term (the part that equation cannot predict)
4) There are two important assumptions related to the residual/error term: – they should not correlate with each other, and their variance should be constant.
5) If the first assumption is violated, or the residuals (between different waves of data, or different groups) correlate with each other, we have autocorrelation. This is particularly so in time series data analysis (called ‘serial correlation’)
6) If the second assumption is violated, we have the phenomenon called heteroskedasticity
How to check for them?
How to know if the first assumption is violated?
We can set the data as panel data, then use Wooldridge test as following in Stata
xtserial DEP (list of predictor) ------> if p value smaller than 0.05, assumption is violated. If I have d1 as DEP and age is the only predictor,

xtserial d1 age

Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
F( 1, 1) = 0.059
Prob>| F| = 0.8486

-->This indicates there is no autocorrelation

Another way is to do this: set the data as times series, then

reg DEP (list of predictor)

dwstat

How to know if heterokesdasticity exist?

reg DEP var list

estat hett

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of d3

chi2(1) = 0.73
Prob >|chi2| = 0.3942

-->This indicates no heterokesdascity

What do to?
What to do if you find autocorrelation? Remove its effect by:

prais d1 age, corc

Iteration 0: rho = 0.0000
Iteration 1: rho = 0.0795
Iteration 2: rho = 0.0808
Iteration 3: rho = 0.0808
Iteration 4: rho = 0.0808

Cochrane-Orcutt AR(1) regression -- iterated estimates

      Source |       SS       df       MS              Number of obs =     127
-------------+------------------------------           F( 1,   125) =    4.85
       Model | 14.3667947     1 14.3667947           Prob >| F|      = 0.0295
    Residual | 370.582352   125 2.96465882           R-squared     = 0.0373
-------------+------------------------------           Adj R-squared = 0.0296
       Total | 384.949147   126 3.05515196           Root MSE      = 1.7218

------------------------------------------------------------------------------
          d1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0306289   .0139136     2.20   0.030     .0030922    .0581656
       _cons |   3.001655   .5507311     5.45   0.000     1.911689     4.09162
-------------+----------------------------------------------------------------
         rho |   .0808052
------------------------------------------------------------------------------
Durbin-Watson statistic (original)    1.838574
Durbin-Watson statistic (transformed) 2.000578

What do do if you find heteroskedasticity?
Heteroskedasticity can distort estimations. In stata, we remove its effect by using robust regression
reg Y X, robust

reg d1 age, robust

Linear regression                                      Number of obs =     128
                                                       F( 1,   126) =    5.55
                                                       Prob>| F|      = 0.0201
                                                       R-squared     = 0.0409
                                                       Root MSE      =   1.722

------------------------------------------------------------------------------
             |               Robust
          d1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0322989   .0137162     2.35   0.020      .005155    .0594428
       _cons |   2.945031   .5509205     5.35   0.000     1.854776    4.035286
------------------------------------------------------------------------------

Total Pageviews

Friday

Checking for autocorrelation /serial correlation