Website Worth

Total Pageviews

Sunday

Gologit2 for ordinal outcome


If you have an ordinal outcome /variable ('at risk' drinking, 'alcohol dependent' in a alcohol scale)

If you want to argue that the probabilities of level 1, 2, 3, …n of the outcome are cumulative, (for example, people with increased chance in 'at risk' category have increased chance in 'dependence' category)

If you are trying to adjust for problems caused by nature of a survey (sample does not represent the population due to non-response, finite population correction),

If you want test the proportional odds assumption for your ordinal variable,


 

  1. Install gologit2 into stata
  2. Key in


    gologit2 depvar invar list, autofit svy subpop(group var)
If the test for proportional odds assumption is insignificant, your model is ok.

If it is not, use separated standard logistic equations, or, you can try multinomial logistic regression

Saturday

Understanding interaction in logistic regression

If we have
y = x + z + xz

How to interpret interaction term xy:

xy = when x is held constant, one unit change in  z yields a change of ( ....) y / or log odds of y (in cases of logistic regression)









Wednesday

Checking regression parralel in ologit regression

Context: If your dependent variable is ordered, like 'low', 'middle', 'high' levels of income, you may want to use ordinal logistic regression. An important assumption of this technique is that the set of coefficients estimated for each level of income needs to be similar with other two sets. We need to check for that. If this assumption is not violated, we report the coefficients by ordinal logistic regression. If it is violated, we should use multinominal logistic regression. This type of regression does not require stability of the sets of coefficients.

How to do in STATA?

It is the syntax in stata,

ologit y x 1 x2...................... xn
brant, detail

Significant results mean this assumption is violated. Large sample size is sensitive to this test. Recode the violating variable and re-run.

If brant still does not work, download omodel, and run this: (from here)



omodel y x1 x2 x2


Significant results mean that assumption is violated! In other words, coefficients do vary significantly between categories of the response variabl.

Sample size is also a problem. More samples are needed in ologit model than OLS model.


If both tips do not work, go for gologit2, see here



Understanding logit regression output

Look at this to see how to interpret odds

Tests needed to do before regression

  • Detecting Unusual and Influential Data
    • predict -- used to create predicted values, residuals, and measures of influence.
    • rvpplot --- graphs a residual-versus-predictor plot.
    • rvfplot -- graphs residual-versus-fitted plot.
    • lvr2plot -- graphs a leverage-versus-squared-residual plot.
    • dfbeta -- calculates DFBETAs for all the independent variables in the linear model.
    • avplot -- graphs an added-variable plot, a.k.a. partial regression plot.
  • Tests for Normality of Residuals
    • kdensity -- produces kernel density plot with normal distribution overlayed.
    • pnorm -- graphs a standardized normal probability (P-P) plot.
    • qnorm --- plots the quantiles of varname against the quantiles of a normal distribution.
    • iqr -- resistant normality check and outlier identification.
    • swilk -- performs the Shapiro-Wilk W test for normality.
  • Tests for Heteroscedasticity
    • rvfplot -- graphs residual-versus-fitted plot.
    • hettest -- performs Cook and Weisberg test for heteroscedasticity.
    • whitetst -- computes the White general test for Heteroscedasticity.
  • Tests for Multicollinearity
    • vif -- calculates the variance inflation factor for the independent variables in the linear model.
    • collin -- calculates the variance inflation factor and other multicollinearity diagnostics
  • Tests for Non-Linearity
    • acprplot -- graphs an augmented component-plus-residual plot.
    • cprplot --- graphs component-plus-residual plot, a.k.a. residual plot.
  • Tests for Model Specification
    • linktest -- performs a link test for model specification.
    • ovtest -- performs regression specification error test (RESET) for omitted variables.

Check for multicollinearity

Also called collinearity (when two predictors in the model has near perfect linear combination of one another)

Multicollinearity: more than two!

Method 1:

In stata, after regression, type vif for variance inflation factor

vif

VIF values greater than 10 need further investigations.

1/VIF (called tolerance value) values smaller than 0.1: need further investigations. In other words, the variable could be seen as a linear combination of other independent variables.

Example of case where multicollinearity exists:

vif = 43

1/vif=0.02

Method 2: Use collin for predictors

Still, look for VIF and tolerance. Also look for ‘condition number’, which is at the bottom. Condition number should not be larger than 9, otherwise, collinearity exists.

How to correct for heteroscedasticity in stata

Heteroscedasticity = the situation in which variance of residuals is not homogenous. It changes across groups, waves of data.

The assumption that variance of residuals is homogenous is often made in regression analysis. However, this assumption needs to be checked. If you are not sure about it, you can use option  'robust' or 'hc3' after 'reg' in Stata.

How to do?
1) small sample: use robust option in reg (stata)
reg DEPVAR ListofPredictor, robust
2) large sample: use hc3 option in reg (stata)
reg DEPVAR listofPredictor, hc3
Read here: http://www.sociology.ohio-state.edu/ptv/faq/heteroscedasticity.htm

Tuesday

Using predict in Stata after regression

1) predicted values of y (y is the dependent variable) no option needed

predict name

2) residuals

predict name, resid

3) standardized residuals

predict name, rstandard

4) studentized or jackknifed residuals

predict name, rstudent

5) leverage

predict name, lev or hat

6) standard error of the residual

predict name, stdr

7) Cook's D

predict name, cooksd

8) standard error of predicted individual y

predict name, stdf

9) standard error of predicted mean y

predict name, stdp

Measurement–Techniques of Neutralization

Model

Learn basic regression with stata here

http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter1/statareg1.htm