What is autocorrelation?
1) So you have an equation to predict values of a dependent variable Y….
2) The predicted values (Y’i) are only ‘close’ the the real values (Yi)….
3) The difference between predicted and real values are called by different names: residual (the left over), or error term (the part that equation cannot predict)
4) There are two important assumptions related to the residual/error term: – they should not correlate with each other, and their variance should be constant.
5) If the first assumption is violated, or the residuals (between different waves of data, or different groups) correlate with each other, we have autocorrelation. This is particularly so in time series data analysis (called ‘serial correlation’)
6) If the second assumption is violated, we have the phenomenon called heteroskedasticity
How to check for them?
How to know if the first assumption is violated?
We can set the data as panel data, then use Wooldridge test as following in Stata
xtserial DEP (list of predictor) ------> if p value smaller than 0.05, assumption is violated. If I have d1 as DEP and age is the only predictor,
xtserial d1 age
Wooldridge test for autocorrelation in panel data
H0: no first-order autocorrelation
F( 1, 1) = 0.059
Prob>| F| = 0.8486
H0: no first-order autocorrelation
F( 1, 1) = 0.059
Prob>| F| = 0.8486
-->This indicates there is no autocorrelation
Another way is to do this: set the data as times series, then
reg DEP (list of predictor)
dwstat
How to know if heterokesdasticity exist?
reg DEP var list
estat hett
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of d3
Ho: Constant variance
Variables: fitted values of d3
chi2(1) = 0.73
Prob >|chi2| = 0.3942
Prob >|chi2| = 0.3942
-->This indicates no heterokesdascity
What do to?What to do if you find autocorrelation? Remove its effect by:
prais d1 age, corc
Iteration 0: rho = 0.0000
Iteration 1: rho = 0.0795
Iteration 2: rho = 0.0808
Iteration 3: rho = 0.0808
Iteration 4: rho = 0.0808
Iteration 1: rho = 0.0795
Iteration 2: rho = 0.0808
Iteration 3: rho = 0.0808
Iteration 4: rho = 0.0808
Cochrane-Orcutt AR(1) regression -- iterated estimates
Source | SS df MS Number of obs = 127
-------------+------------------------------ F( 1, 125) = 4.85
Model | 14.3667947 1 14.3667947 Prob >| F| = 0.0295
Residual | 370.582352 125 2.96465882 R-squared = 0.0373
-------------+------------------------------ Adj R-squared = 0.0296
Total | 384.949147 126 3.05515196 Root MSE = 1.7218
-------------+------------------------------ F( 1, 125) = 4.85
Model | 14.3667947 1 14.3667947 Prob >| F| = 0.0295
Residual | 370.582352 125 2.96465882 R-squared = 0.0373
-------------+------------------------------ Adj R-squared = 0.0296
Total | 384.949147 126 3.05515196 Root MSE = 1.7218
------------------------------------------------------------------------------
d1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0306289 .0139136 2.20 0.030 .0030922 .0581656
_cons | 3.001655 .5507311 5.45 0.000 1.911689 4.09162
-------------+----------------------------------------------------------------
rho | .0808052
------------------------------------------------------------------------------
Durbin-Watson statistic (original) 1.838574
Durbin-Watson statistic (transformed) 2.000578
What do do if you find heteroskedasticity?d1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0306289 .0139136 2.20 0.030 .0030922 .0581656
_cons | 3.001655 .5507311 5.45 0.000 1.911689 4.09162
-------------+----------------------------------------------------------------
rho | .0808052
------------------------------------------------------------------------------
Durbin-Watson statistic (original) 1.838574
Durbin-Watson statistic (transformed) 2.000578
Heteroskedasticity can distort estimations. In stata, we remove its effect by using robust regression
reg Y X, robust
reg d1 age, robust
Linear regression Number of obs = 128
F( 1, 126) = 5.55
Prob>| F| = 0.0201
R-squared = 0.0409
Root MSE = 1.722
F( 1, 126) = 5.55
Prob>| F| = 0.0201
R-squared = 0.0409
Root MSE = 1.722
------------------------------------------------------------------------------
| Robust
d1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0322989 .0137162 2.35 0.020 .005155 .0594428
_cons | 2.945031 .5509205 5.35 0.000 1.854776 4.035286
------------------------------------------------------------------------------
| Robust
d1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0322989 .0137162 2.35 0.020 .005155 .0594428
_cons | 2.945031 .5509205 5.35 0.000 1.854776 4.035286
------------------------------------------------------------------------------