RPower: an R package to estimate power and determine the sample size for replication studies of genome-wide association studies

About RPower

Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the "winner's curse" will occur.

Here we provided RPower as an R package to estimate power and determine the sample size for replication studies of genome-wide association studies. The power estimation method is based on Empirical Bayes, which is used for reducing bias of the winner's curse in the primary study. Details about the method can be seen in our related publication below.


Related Publication
W. Jiang and W. Yu
"Power Estimation and Sample Size Determination for Replication Studies of Genome-Wide Association Studies",
accepted in the Fourteenth Asia Pacific Bioinformatics Conference (APBC 2016).

Where to download RPower

The R-package is available at :
Windows:  RPower_1.0.zip
Linux:        RPower_1.0.tar.gz

The manual is available at: RPower-manual.pdf


Environment configuration

It can be directly installed in the R environment with following command:

Windows:   install.packages("RPower_1.0.zip",repos=NULL)
Linux:          install.packages("RPower_1.0.tar.gz",repos=NULL)


Use the following command to load the package in the R environment:

library("RPower")

How to use it?

The principal components of RPower are repPowerEB, repSampleSize and repSampleSize2. Also there is a simple function SEest in the package.

1. To estimate the power of replication study in GWAS, we need to obtain the observed log-odds ratios and their corresponding standard errors of every genotyped SNPs first. We have put an example summary statistics data (gwasSmryEx) in our package. You can use data(gwasSmryEx) to load the example data.

2. Then the standard errors of log(OR) in the replication study need to be anticipated, which can be calculated using Woolf's method with function SEest:

SEest(n0,n1,fU,fA)

Details about the function can be seen using help(SEest) in the R environment.

3. You can use repPowerEB to estimate the powers of each identified primary positive associations (i.e. associations identified from primary study) in the replication study. Their credible intervals and average power are also presented in the results.

repPowerEB(MUhat,SE, SE2,zalpha2,zalphaR,boot=100,num=100,output=T, dir='output',info=T)

Details about the function can be seen using help(repPowerEB) in the R environment.

4. To determine the sample size of the replication study with a certain power, repSampleSize or repSampleSize2 can be used.

repSampleSize(power, n, MUhat,SE,zalpha2,zalphaR)

repSampleSize2(power,CCR2, MUhat,SE,fU,fA,zalpha2,zalphaR)

repSampleSize is used when the control to case ratio (CCR) of the replicaiton study is the same with the primary study. Otherwise, for a designed replication study with specific CCR, repSampleSize2can be used.

Details about these function can be seen using help(repSampleSize) or help(repSampleSize2) in the R environment.