Return to UOCC HomeComputing News Home
Header bar

Computer Software for Power Calculations

Robin High
Statistical Programmer and Consultant
robinh@uoregon.edu

The material in this article is based on power analysis concepts first presented in an introductory Computing News piece I wrote a few years ago. The important concepts haven't changed, and you can read them at http://cc.uoregon.edu/cnews/summer2000/statpower.html

Essential Points to Review

When planning a study or an experiment, for each research question a null hypothesis (HO: no difference) and an alternative hypothesis (HA: an effect exists) are presented. For example, under the design of two groups with subjects selected independently for each group, to test the population means for a continuous response variable across groups, relevant hypotheses are:

HO: the two population means are equal
HA: the two population means are different

The typical choice to test these hypotheses is the two-sample T-test. Power is defined as "the probability that the significance test will reject the null hypothesis for a specified value of the alternative hypothesis." Given this background, you would begin a power analysis by planning the study's objectives, then specifying a statistical model and an appropriate test statistic. Your input for power analyses would include the following essential components:

  1. Significance level (the probability of a Type I error). Common choices are α=.05 or .01.
  2. Desired power to detect a difference expressed as 1-ß, where ß is the probability of a Type II error. Power=0.80 or 0.90 are common choices.
  3. Effect size the researcher determines to be a meaningful difference to detect. Effect size depends on the design and population parameters; the most common ones are summarized in Cohen (1988, 1992) and Cortina (2000). The values of these parameters are often determined from the researcher's experience or by utilizing data from existing studies.
  4. Sample size: the number of subjects to be studied.

These diverse components are not independent: in fact, the specification of any three of them automatically determines the fourth. The usual objective of a power analysis is to calculate the sample size (4) required to satisfy values given for (1)-(3). It can also be utilized in studies with limited resources where the maximum total sample size (4) is known. In this situation power analysis becomes a helpful tool to determine if sufficient power exists (2) for specified values of (1), (3), and (4). As a result, the researcher can evaluate whether the study is worth pursuing.

Power Calculations in SAS 9.1

This article briefly introduces PROC POWER and PROC GLMPOWER, two new procedures in SAS 9.1 that are specifically designed to compute power for a variety of statistical designs to assist you with study planning.

PROC POWER calculates power for the most common statistical design problems, including one and two-sample T-tests, correlations, and proportions, as well as regression and one-way ANOVAs, among others.

For example, suppose you want to compute the total sample size required to test the equality of population means from two independent groups (A and B) with a two-sample T-test. The statistical design assumes equal group sizes (they can also be unequal). The response variable y is normally distributed in each group with means µA and µB respectively, and have a common standard deviation (σ). The following hypotheses for the difference between these population means are specified:

HO: µA - µB = 0
HA: µA - µB≠ 0

HA is a two-sided test, since deviations in either direction from 0 would be important to determine. For a power analyses you can also specify one-sided tests where ≠ is replaced with < or >.

What is the total sample size required, such that the probability of obtaining a t statistic equal to or larger than a critical value is α =.05 under HO and power=.9 for a specified effect size (which belongs to the values from HA)? To make this calculation, the procedure is invoked with the PROC POWER statement followed by a statement specifying which statistical test is to be made and also includes your the relevant inputs:

PROC POWER;
TWOSAMPLEMEANS
TEST = diff /* difference in means */
ALPHA = .05 /* significance level */
SIDES = 2 /* 2-sided test */
MeanDiff =
2 /* µA - µB */
STDDEV = 4 /*standard deviation in each group*/
GROUPWeights = (1 1) /* equal group sizes */
NTotal = . /* NTotal = nA + nB */
POWER = .9 /* desired power */
;
RUN;

The effect size for the difference in two population means can be inferred from the items listed on the POWER statement (MeanDiff = 2) divided by the standard deviation (STDDEV=4):

Effect Size = (µA-µB)/σ = 2/4 = 0.5

The effect size for this example is the difference in the two means divided by the standard deviation, giving a medium effect size equal to 0.5, according to Cohen (1988).

Notice how all four of the required components of a power analysis are included among the options for the TWOSAMPLEMEANS statement with one of them set to 'missing' (in SAS the period is the usual missing value entered for numbers). Thus, by replacing any one of the numbers specified above with a period and entering relevant values for the other options, you can solve for the missing item.

In the example given above, when the missing item is the total sample size (NTotal= . ) the following output gives the total sample size:

Actual Power
NTotal
0.903 172

The total sample size required to meet the specified inputs is NTotal=172, which implies 86 subjects are needed in each group. The actual power printed on the output is 0.903, which is slightly higher than the specified value of power=.90 since an integer for the number of subjects in each group is required and these two numbers must add to 172. The program rounds the computed sample size up (the actual total sample size to achieve power=.90 is NTotal = 170.063), thus NTotal=172, evenly divisible by 2, slightly increases the actual power of the study.

The POWER procedure allows you to enter multiple values of each parameter for each option. For example, you can enter ALPHA = .05 .01 to compute power under two choices of alpha. You can also enter multiple values of Ntotal with the individual numbers (50 100 150 200) or abbreviated notation (50 to 200 by 10) to compute how power changes with increasing sample sizes. Although PROC POWER does have the capability to produce plots, the most flexible approach is to place the output into a SAS dataset with the Output Delivery System (ODS). These results can then be plotted with PROC GPLOT to produce a smooth curve for each level of alpha for varying sample sizes. The specific sample size where power reaches 0.8 and 0.9 can easily be determined through a visual inspection of the plot. Examples of how this process works are available at http://www.uoregon.edu/~robinh/130_power.html

Power computations to compare the means from two groups can also be illustrated with PROC GLMPOWER to introduce how one may compute power for ANOVAs:

DATA anv;
INPUT group mean @@;
cards;
1 10 2 12
;
PROC GLMPOWER DATA=anv;
CLASS group;
MODEL mean = group;
POWER Alpha = .05
STDDEV = 4
NTotal = .
POWER = .9
;
RUN;

Error DF Actual Power NTotal
170 0.903 172

The syntax for PROC GLMPOWER looks much like a combination of PROC GLM and PROC POWER. It computes power for multifactor designs that include main effects and interactions and also specific contrasts among the levels of the factors of interest. However, the example shows that one major difference is that the computation of the effect size (as defined by the differences among the means) is determined from inputs to an 'exemplary' dataset in addition to the common cell standard deviation and equal group sizes assumed here. Since effect size computations for ANOVAs depend on the values of means across all levels of the categorical variables, entering them into a dataset is more efficient than entering them into the procedure itself.

GLMPOWER can assist you with power calculations for more complex designs such as ANCOVA, which include specification of variance reduction due to one or more covariates in the model. Calculations for even more complicated designs such as repeated measures or multi-level models are not yet available as supported features in SAS, although approaches for them are available with other software or through simulation techniques (see Chapter 12 of Littell, et al.).

Comments

The specification of an appropriate effect size is usually the most difficult input for a power analysis. A difference between two means of interest is usually simple to define, yet the standard deviation (σ) may be difficult to estimate; it gets even trickier for other designs.

For example, when computing power to test correlations and proportions you need to apply transformations such as Fisher's Z and the arcsin respectively to compute effect sizes. The narrow range of possible values for these two parameters makes computations of differences between actual values inappropriate.

The effect size for linear regression is a transformation of r-square. R-square is the square of the correlation which equals:

„‚ = ß * σx / σy

This formula implies that understanding what effect size means for linear regression is actually based on three components: the value of the linear regression coefficient, beta, and the sources of variation,σx and σy. This means you can increase the detectable effect size by enlarging the variation in the predictor value x (i.e., the experimental design), and by minimizing measurement error variation of the response variable, y.

With these power procedures you are no longer confined to the three levels of effect size (i.e., small=.2, medium=.5, and large=.8) that Cohen has made so popular (see Lenth, 2001). The tables in his publications are presented in terms of his specified effect sizes, not necessarily the actual values you need. With SAS you can now calculate power for any effect size, large or small, based on the values of the parameters of interest. (Examples of how to compute power to replicate Cohen's tables are available at http://www.uoregon.edu/~robinh/130_power.html ). It is then a simple task to enhance the tables and graphs based on your chosen inputs.

Power as computed by SAS is prospective, that is, it is an 'a priori' concept. Power analysis should be directed towards planning a study, not doing a post-mortem review of the results. Variations in the parameters of interest to compute power under different scenarios should be explored before data are collected. None of the SAS procedures, including POWER and GLM POWER, provide retrospective (post hoc) power calculations. These computations have been shown to produce misleading and biased conclusions (even though routinely output with SPSS procedures and often requested by some journals). See Hoenig and Heisey, 2001, for reasons behind this fallacious thinking.

Allow Modern Computing Technology to Increase Power!

The primary goal of statistical power calculations is to provide insight into how many subjects are needed for a specific design and research objective. Recent advances in computing technology have made more powerful analytical techniques readily available, yet many researchers appear to be stuck in the 1970s and 80s in the way they apply statistics. Although it's necessary to know how to analyze data with the basic designs, the current trends in statistical computing indicate the importance of collaboration between researchers and statisticians from planning through analysis.

For example, with repeated measures data, statistical software can now work directly with the within-subject covariance matrix, which is much more realistic than the checks for the "sphericity" condition (including the out-of-date test by Mauchly from 1940) which are still routinely taught. This includes data collected over time or multiple conditions from each subject. Also, analyzing subject means collected from repeated trials is usually not necessary or even desirable!

Although statistical analysis should never be expected to rescue data from a bad design or other miscues, a wealth of modern study planning and data analysis techniques are currently available that can help you assess which statistical model is most appropriate to your study.

References

  1. Cohen, Jacob. (1988) Statistical Power Analysis tor the Behavioral Sciences, 2nd ed., Hillsdale, N.J., L. Erlbaum Associates.
  2. Cohen, Jacob, (1992) "A Power Primer," Psychological Bulletin, Vol. 112, No.1, 155-159.
  3. Cortina, Jose and Nouri, Hossein (2000). "Effect Size for ANOVA Designs." Sage University Papers Series on Quantitative Applications in the Social Sciences, -7-129. Thousand Oaks, CA: Sage.
  4. Hoenig, John M. and Heisey, Dennis M. (2001), "The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis," The American Statistician, 55, 19-24.
  5. Lenth, R. V. (2001), "Some Practical Guidelines for Effective Sample Size Determination," The American Statistician, 55, 187-193

Spring 2006 Computing News | Computing Center Home Page