BOOST GBOOST PBOOST GBOOST 2.0

BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies

About BOOST

BOolean Operation based Screening and Testing (BOOST) is a method for detecting gene-gene interactions. It allows examining all pairwise interactions in genome-wide case-control studies in a remarkably fast manner. Interaction analyses on seven data sets from the Wellcome Trust Case Control Consortium were carried out. Each analysis took less than 60 hours to completely evaluate all pairs of roughly 360, 000 SNPs on a standard 3.0 GHz desktop with 4G memory running Windows XP system.


What is BOOST?

The definition of gene-gene interactions used in BOOST

BOOST uses widely accepted definition of gene-gene interactions which is based on logistic regression models. The logistic regression model with only main effects (i.e., the main effect model) has the following form:

Equation1

The logistic regression model with both main effect terms and interaction terms (i.e., the full model) has the following form:

Equation2

Let LM and LF be the maximum log-likelihoods of the main effect model and the full model, respectively. According to the likelihood ratio test, interaction effects are defined as the difference of their deviance, i.e., 2(LF-LM) .

Boolean operations

Based on the equivalence between a logistic regression model and its corresponding log-linear model, we construct our test statistic using log-linear models. The advantage of doing so is that we only need to fill the contingency tables. We design a Boolean representation of genotype data, which promotes the CPU efficiency because it only involves Boolean values and allows using fast logic (bitwise) operations to obtain the contingency tables.

Screening and Testing

In the space of log-linear models, the homogeneous association model is the equivalent form of the main effect logistic regression model, and the saturated model matches the full logistic regression model. Since no closed-form solution exists for maximum likelihood estimation of the homogeneous association model, we propose a two-stage (screening and testing) method. In the screening stage, we use a non-iterative method to approximate the likelihood ratio test statistic in evaluating all pairs of SNPs and select those passing a specified threshold. Most non-significant interactions will be filtered out. The survival of significant interactions is guaranteed becasue of the tight approximation bound. In the testing stage, we employ the classical likelihood ratio test to measure the interaction effects of selected SNP pairs.


Where to download BOOST and how to use it?

The source, Win 32 and Win x64 executable file of BOOST is available : here (Downloads = ).

You may also download the files at a mirror site in http://snpboost.sourceforge.net.

For any enquiry, please contact YANG Can at macyang@ust.hk or Xiang Wan at xiangwan5@gmail.com.


Simulated data sets used in BOOST

Case 1 -- Disease loci with main effects: here.

Case 2 -- Disease loci without main effects: please see the reference Velez, D.R., White, B.C., Motsinger, A.A., Bush, W.S., Ritchie, M.D., Williams, S.M., and Moore, J.H. (2007). A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet. Epidemiol. 31, 306–315. .

Update Log
30 - 12 - 2010 -- Simulation data sets for BOOST are released.
17 - 09 - 2010 -- Source code of BOOSTw32 and Its executable version is added to BOOST.zip.
10 - 09 - 2010 -- BOOST web site is ready with BOOST's source and Win x64 executable.
Related Publication
X. Wan*, C. Yang*, Q. Yang, H. Xue, X. Fan, N. Tang and W. Yu. *Joint first author.
"BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies",
The American Journal of Human Genetics. 2010. [pdf] [suppl]

GBOOST

About GBOOST

GBOOST is a GPU-implementation of BOOST based on the CUDA technology by Nvidia. For the same interaction analysis of BOOST on a dataset from the Wellcome Trust Case Control Consortium, GBOOST can complete the analysis in 1.34 hours on a standard 2.3 GHz with 4G memory and a standard Nvidia GeForce GTX 285 display card.


Where to download GBOOST and how to use it?

The source and Win x64 executable file of GBOOST are available : here (Downloads = ).

The source and executable files of GBOOST_GUI are available : here (Downloads = ).

For any enquiry, please contact YUNG Ling Sing at tim.yung@alumni.ust.hk.


Recommended hardware, platform and software version for GBOOST

CPU : Intel CPU 2.13GHz or above

Main Memory : 2 GB or above

Display Card : Nvidia GTX 285

Operating System : Windows Vista x64

CUDA Driver : 2.3

Please refer to http://www.nvidia.com/object/cuda_gpus.html for a list of CUDA Enabled GPUs.


Update Log
21 - 08 - 2013 -- Remove dependencies from windows types and add makefile for compiling on Linux system.
21 - 03 - 2011 -- Minor bug fix. Handle the problem of skipping all SNP pairs starting with i index 0.
21 - 02 - 2011 -- Using Hamming weight algorithm instead of look-up table algorithm for bit string counting.
03 - 12 - 2010 -- First release of GBOOST with Win x64 executable and source files.

Related Publication
L. S. Yung, C. Yang, X. Wan, and W. Yu
"GBOOST : A GPU-based tool for detecting gene-gene interactions in genome-wide case control studies",
Bioinformatics, 27:1309-1310, 2011. [link]

PBOOST

About PBOOST

PBOOST is a GPU based tool for parallel permutation tests in genome-wide association studies.


Where to download PBOOST and how to use it?

The source and the demo data files are available : here (Downloads = )


Environment configuration
It can be compiled and run in Windows and Linux environments. 64-bit environment is preferred. Please refer to http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux for the supported Linux distributions. NVIDIA CUDA Toolkit is necessary, which is available at http://developer.nvidia.com/cuda-downloads . CUDA toolkit 5.5 or above is recommended.
Compiling
On Windows, we use Visual studio 2010 to compile and build the executable file. Put the 4 files: gpuMain.cpp, perm_kernel_MT32.cu, permUtility.cpp, permUtility.h in the Visual Studio project and we will be able to build it. On Linux, we use gcc to compile the project.
1. Put the 4 source files along with the 'makefile' file in the same path.
2. Change directory to the path and type 'make' in the command line.
3. chmod +x ./PERMUTE

Related Publication
G. Yang, W. Jiang, Q. Yang, and W. Yu
"PBOOST : A GPU based tool for parallel permutation tests in genome-wide association studies.",
Bioinformatics, 31:1460-1462, 2015. [link]

GBOOST 2.0

About GBOOST 2.0

GBOOST 2.0 is a GPU implementation of advanced BOOST method (with covariates adjustment) based on the CUDA technology by Nvidia..


Where to download GBOOST 2.0 and how to use it?

The source and the demo data files are available : here (Downloads = ).


Recommended hardware, platform and software version for GBOOST 2.0

CPU : Intel(R) Core(TM) i5-4570 CPU 3.20 GHz

Main Memory : 4 GB or above

Display Card : Nvidia GTX 580

Operating System : Windows 7 x64

CUDA Driver : 7.5 or above

Please refer to http://www.nvidia.com/object/cuda_gpus.html for a list of CUDA Enabled GPUs.