There are many variables that do not follow the rule of normal distributions. We may call them ‘qualitative’, ‘discrete’ or ‘categorical’ variables. What to do ? Consider the following cases
1) One continuous variable to be compared across 2, 3 groups or more? And you want to do something like ANOVA ?
If alcohol is an index, if it is a continuous variable following normal distribution, and you would like to compare alcohol across 4 groups (numbered from 1 to 4 in variable ‘ex’):
anova alcohol i.ex
pwcompare i.ex
--------------------------------------------------------------
| Unadjusted
| Contrast Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
ex |
2 vs 1 | 7.84005 .6146453 6.63063 9.04947
3 vs 1 | 1.063768 .7986802 -.5077717 2.635308
4 vs 1 | 11.31003 .7748954 9.785287 12.83477
3 vs 2 | -6.776282 .8104726 -8.371025 -5.181539
4 vs 2 | 3.469976 .7870442 1.921332 5.01862 4 vs 3 | 10.24626 .9378378 8.400902 12.09161--------------------------------------------------------------
Yellow lines indicate significant differences. But if it is clearly that, as an index, alcohol does not follow normal distribution? We should think about rank.
kwallis2 alcohol, by(ex)
(where alcohol is a continuous variable, not normally distributed)
One-way analysis of variance by ranks (Kruskal-Wallis Test)
ex Obs RankSum RankMean
--------------------------------
1 115 9896.00 86.05
2 104 22142.00 212.90
3 45 4775.00 106.11
4 49 12328.00 251.59
Chi-squared (uncorrected for ties) = 178.123 with 3 d.f. (p = 0.00010)
Chi-squared (corrected for ties) = 191.758 with 3 d.f. (p = 0.00010)
Multiple comparisons between groups
-----------------------------------
(Adjusted p-value for significance is 0.004167)
Ho: alcohol(ex==1) = alcohol(ex==2)
RankMeans difference = 126.85 Critical value = 32.31
Prob = 0.000000 (S)
Ho: alcohol(ex==1) = alcohol(ex==3)
RankMeans difference = 20.06 Critical value = 41.98
Prob = 0.103737 (NS)
Ho: alcohol(ex==1) = alcohol(ex==4)
RankMeans difference = 165.54 Critical value = 40.73
Prob = 0.000000 (S)
Ho: alcohol(ex==2) = alcohol(ex==3)
RankMeans difference = 106.79 Critical value = 42.60
Prob = 0.000000 (S)
Ho: alcohol(ex==2) = alcohol(ex==4)
RankMeans difference = 38.69 Critical value = 41.37
Prob = 0.006809 (NS)
Ho: alcohol(ex==3) = alcohol(ex==4)
RankMeans difference = 145.48 Critical value = 49.30
Prob = 0.000000 (S)
2) Compare two groups?
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
ranksum2 bin, by(sex)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
sex | obs rank sum expected
-------------+---------------------------------
Female | 39 2518.5 3607.5
Male | 145 14501.5 13412.5
-------------+---------------------------------
combined | 184 17020 17020
unadjusted variance 87181.25
adjustment for ties -25532.48
----------
adjusted variance 61648.77
Ho: bin(sex==Female) = bin(sex==Male)
z = -4.386
Prob > |z| = 0.0000
U/mn .69257294
3) Association between two categorical variables?
spearman bin sex, stats(rho p)
Number of obs = 184
Spearman's rho = 0.3242
Test of Ho: bin and sex are independent
Prob > |t| = 0.0000
4) Regression with many variables?
logistic bin sex ethnic