The Second IASTED International Conference on
Computational Bioscience
CompBio 2011

July 11 – 13, 2011
Cambridge, United Kingdom


Exploiting Genetic Variation to Infer Causal Relationships between Phenotypic Biomarkers and Outcomes

Prof. Paul McKeigue
The University of Edinburgh Medical School, United Kingdom


In principle it is possible to exploit genotypic variation in observational studies to learn about causal relationships between phenotypic variables such as biomarkers and health outcomes. This can be applied to discover intermediate variables that may be targets for therapeutic or preventive interventions, or to validate phenotypic biomarkers as possible surrogate end-points in clinical trials. The ``Mendelian randomization'' approach is an application of the “instrumental variable” argument first developed in econometrics and later formalized in graphical theories of causality. Given a phenotypic biomarker that predicts the outcome under study, the investigator attempts to identify one or more genetic “instruments” - genes that perturb the biomarker and for which the assumption of no pleiotropic effects – no effects on outcome that are not mediated through the biomarker – is plausible. The causal effect of the biomarker is inferred from the relationship of outcome to the predicted level of the biomarker given genotype. Full exploitation of this approach requires statistical methods that can combine data from multiple sources, including case-control, cohort and cross-sectional studies. This is possible using Bayesian approaches with Markov chain Monte Carlo simulation to sample the posterior distribution. The application of this approach in general is severely limited by the requirement to assume no pleiotropy – this is possible only where the function of the gene is well understood.
A somewhat different approach to exploiting genotypic variation in observational studies originates in systems biology, and has been denoted “reverse-engineering of genotype-phenotype relationships”, or “integrative genomics” by Schadt and colleagues. In this approach high-dimensional phenotypic biomarker data (such as gene expression arrays) are examined and all biomarkers are tested for evidence of causal effects on the outcomes or traits of interest using genome-wide genotypic data. This approach does not require the assumption of no pleiotropy. In a Bayesian framework, with some prior assumptions about the size of effects, it is possible to infer causality even where the requirements of classical graph theory (such as no pleiotropy) for inference of causality do not hold. This is based on comparing the marginal likelihood of different models, or on approximations to this comparison in which model “fit” is penalized by “complexity”. In Schadt's Likelihood-based Causality Model test, genotype-biomarker-triads are examined one at a time and the support for a causal model compared with alternatives using the Akaike Information criterion or Bayes Information criterion. A serious limitation of this approach is that the genetic and phenotypic variables are not modelled jointly, which does not fully exploit the data and leads to inconsistent results. We have developed an approach that can be directly applied to high-dimensional genotypic and phenotypic data, in which all possible causal, latent confounding and pleiotropic effects are evaluated simultaneously with multiple outcomes, after an initial feature selection step to reduce the dimensionality of the dataset. This builds on sparse linear modelling methods developed recently in other areas of machine learning. In this Sparse Instrumental Variable (SPIV) approach, a sparse prior distribution on effects is specified and effects not supported by the model are pruned by automatic relevance determination. Formal hypothesis testing to compare alternative hypotheses for the role of a biomarker can then be undertaken as a final step. This can be combined with methods for inferring networks in the presence of latent variables. This approach has wide application in systems biology, epidemiology and drug development.

Biography of the Keynote Speaker

Keynote Speaker Portrait

Paul McKeigue has been Professor of Genetic Epidemiology and Statistical Genetics at the University of Edinburgh since 2007 . He previously held tenured professorial posts at the London School of Hygiene & Tropical Medicine and University College Dublin.
His research interests have spanned a wide range of fields, from the epidemiology of diabetes and cardiovascular disease in ethnic minorities to genetic epidemiology. More recently he has focused on the development of Bayesian statistical methods in genetic epidemiology and related fields, including methods that exploit genetic variation to infer causal relationships between phenotypic biomarkers and outcomes.