Website Worth

Total Pageviews

Monday

Basic nonparametric tests in Stata

There are many variables that do not follow the rule of normal distributions. We may call them ‘qualitative’, ‘discrete’ or ‘categorical’ variables. What to do ? Consider the following cases
1) One continuous variable to be compared across 2, 3 groups or more? And you want to do something like ANOVA ?
If alcohol is an index, if it is a continuous variable following normal distribution, and you would like to compare alcohol across 4 groups (numbered from 1 to 4 in variable ‘ex’):
anova alcohol i.ex
pwcompare i.ex
--------------------------------------------------------------
             |                                 Unadjusted
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
          ex |
    2 vs 1  |    7.84005   .6146453       6.63063     9.04947
     3 vs 1  |   1.063768   .7986802     -.5077717    2.635308
     4 vs 1  |   11.31003   .7748954      9.785287    12.83477
     3 vs 2  |  -6.776282   .8104726     -8.371025   -5.181539
    4 vs 2  |   3.469976   .7870442      1.921332     5.01862     4 vs 3  |   10.24626   .9378378      8.400902    12.09161--------------------------------------------------------------
Yellow lines indicate significant differences. But if it is clearly that, as an index, alcohol does not follow normal distribution? We should think about rank.
kwallis2 alcohol, by(ex)
(where alcohol is a continuous variable, not normally distributed)
One-way analysis of variance by ranks (Kruskal-Wallis Test)
ex       Obs   RankSum  RankMean
--------------------------------
  1      115   9896.00     86.05
  2      104  22142.00    212.90
  3       45   4775.00    106.11
  4       49  12328.00    251.59
Chi-squared (uncorrected for ties) =   178.123 with    3 d.f. (p = 0.00010)
Chi-squared (corrected for ties)   =   191.758 with    3 d.f. (p = 0.00010)
Multiple comparisons between groups
-----------------------------------
(Adjusted p-value for significance is 0.004167)
Ho: alcohol(ex==1) = alcohol(ex==2)
    RankMeans difference =    126.85  Critical value =     32.31
    Prob = 0.000000 (S)
Ho: alcohol(ex==1) = alcohol(ex==3)
    RankMeans difference =     20.06  Critical value =     41.98
    Prob = 0.103737 (NS)
Ho: alcohol(ex==1) = alcohol(ex==4)
    RankMeans difference =    165.54  Critical value =     40.73
    Prob = 0.000000 (S)
Ho: alcohol(ex==2) = alcohol(ex==3)
    RankMeans difference =    106.79  Critical value =     42.60
    Prob = 0.000000 (S)
Ho: alcohol(ex==2) = alcohol(ex==4)
    RankMeans difference =     38.69  Critical value =     41.37
    Prob = 0.006809 (NS)
Ho: alcohol(ex==3) = alcohol(ex==4)
    RankMeans difference =    145.48  Critical value =     49.30
    Prob = 0.000000 (S)

2) Compare two groups?
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
ranksum2 bin, by(sex)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
         sex |      obs    rank sum    expected
-------------+---------------------------------
      Female |       39      2518.5      3607.5
        Male |      145     14501.5     13412.5
-------------+---------------------------------
    combined |      184       17020       17020
unadjusted variance    87181.25
adjustment for ties   -25532.48
                     ----------
adjusted variance      61648.77
Ho: bin(sex==Female) = bin(sex==Male)
             z =  -4.386
    Prob > |z| =   0.0000
U/mn         .69257294
3) Association between two categorical variables?
spearman bin sex, stats(rho p)
Number of obs =     184
Spearman's rho =       0.3242
Test of Ho: bin and sex are independent
    Prob > |t| =       0.0000
4) Regression with many variables?
logistic bin sex ethnic