Website Worth

Total Pageviews

Wednesday

Power Analysis for multiple logistic regression

I – Short answer: use powerlog in Stata
II – Long answer:
powerlog, p1(.25) p2(.35) rsq(.4)
In order to understand that, you should think of a ‘dependent variable’,  ‘key predictor’, and all other predictors which are of less interest, or confounders.
p1 and p2 concern the key predictor and the dependent variable (Y) only:
  • at p1, you guess the possibility of Y=1 if p1 is at  its mean
  • at p2, you guess the possibility of Y=1 if p1 is at its mean plus one standard deviation (or p1=mean+1 SD)
Where do the ‘guesses’ come from: from the literature review, or a pilot test, you make an ‘educated’ guess. This guess should have a practical application if it is true.
rsq concerns only the key predictors and all other predictors. Ideally there should be no endogenous variable in the set of predictors – or rsq should be 0. If you expect some level of correlation between the key predictor and other predictors, note that:
  • strong correlation: 0.5 to 1
  • moderate correlation 0.3 to 0.49   
  • weak correlation 0.1 to 0.29
Ideally, you should remove predictors that strongly correlate with others, because they are redundant, unless you have good reasons to keep. A model is efficient if it has less predictors but explain more variance of Y.
In the case above p1 is set at 0.25 and p2 at 0.35. That means when the key predictor takes its mean value, the possibility that Y=1 (eg. getting promoted, getting cancer, being an alcohol abuser) is 0.25. Note that when p2 is set at 0.35, the expected association between Y and key predictor is a positive one – when key predictor increase, the change of getting Y=1 increases.
In Stata we get 290 for power=.80
powerlog, p1(.25) p2(.35) rsq(.4)
Logistic regression power analysis
One-tailed test: alpha=.05  p1=.25  p2=.35  rsq=.4  odds ratio=1.615384615384615

power    n
0.60    173
0.65    196
0.70    223
0.75    253
0.80    290
0.85    335
0.90    397

Note that this is a power analysis for one-tailed test, which mean 5% chance of ‘wrong guess’ goes all on one tail of the curve:
Graph
For a two-tailed test, each tail would have a 2.5% chance that the guess would go wrong
Graph2

In stata, we can do as below. This time we get 369 for power=.8, two tailed.
. powerlog, p1(.25) p2(.35) rsq(.4) alpha(0.025)
Logistic regression power analysis
One-tailed test: alpha=.025  p1=.25  p2=.35  rsq=.4  odds ratio=1.615384615384615

power          n
0.60         235
0.65         263
0.70         293
0.75         328
0.80         369
0.85         420
0.90         489