Skip to main navigation Skip to search Skip to main content

A parallel computing-based gene-gene interaction detection method with covariate adjustment

  • Meng Wang

Student thesis: Master's thesis

Abstract

In genome-wide association studies (GWAS), detecting interactions among single nucleotide polymorphism (SNP) pairs and phenotypes is important to reveal the relationship between genotypes and genetic diseases. The most commonly used measurement for interactions is the departure from a linear model, which describes the statistical relationship between genotypes and phenotypes. Recently, a Boolean operation-based screening and testing (BOOST) method was proposed to detect interactions with log-linear models. As the interaction detection is parallel, a GPU-based implementation of the BOOST method, named GBOOST, was made available for acceleration. Neither BOOST nor GBOOST methods take covariates into consideration in their models, which may lead to inaccurate or even wrong interaction results under some circumstances. In the thesis, two covariate-adjusted interaction detection tools, (BOOST 2.0 and GBOOST 2.0,) will be presented. BOOST 2.0 is a CPU multi-threaded version of the advanced method, and GBOOST 2.0 is a GPU-based implementation. We will introduce the log-linear models and the solutions to the maximum log-likelihood of the models used in the method. Then the CPU multi-threaded and GPU implementations will be illustrated. BOOST 2.0 and GBOOST 2.0 are both divided into four modules: data loading, screening, testing and results mapping. In the data loading step, genetic data is transformed into Boolean representation so that we can take advantage of the fast speed of bit operation. Two fast approximate models are used in the screening step to filter out SNP pairs with low interaction values. The screening step is the most computationally intensive part since it exhaustively calculates interaction values for all SNP pairs. Then we apply an iterative algorithm to calculate interaction values for the small portion of SNP pairs, which have passed the screening step. Last, we map the significantly interacted SNP pairs back to their positions on corresponding chromosomes. The performance comparison of BOOST 2.0/GBOOST 2.0 with BOOST/GBOOST will be presented using simulated data. We will also demonstrate the discoveries on real data with BOOST 2.0 and GBOOST 2.0.
Date of Award2017
Original languageEnglish
Awarding Institution
  • The Hong Kong University of Science and Technology

Cite this

'