Multi PheWAS viewer

What is a PheWAS ?

A phenome-wide association study is an association study between phenotypes (e.g. disease, clinical feature,...) and genotypes.In this example, phenotypes are represented as groups of diagnosis codes from the International Classification of diseases (ICD).And the genotypes are represented Single Nucleotide Polymorphisms (SNPs) which are locations in the genome where mutations of a single DNA base are frequent. A PheWAS consists in selecting a group of patients with a specific allele of a SNP and a group of control patients with the other allele and testing each phenotype to find systematic associations between one SNP and all the phenotypes.

Why a multi-PheWAS viewer ?

The usual way of viewing the results of a PheWAS is a Manhattan plot (See figure). Each point represents a phenotype, spread horizontally depending on the category of the phenotype (e.g. cardio-vascular, pulmonary,...) The vertical axis is dedicated to the strength of the association between the data point and the SNP analyzed, the higher the point is, the more significant the association is.

This kind of visualization has 2 major limitations:

  • It is not possible to see the results for more than one PheWAS analysis on the same figure. One manhattan plot corresponds to the analysis of one SNP against all the diagnoses. But, showing the results of more than one SNP on a manhattan plot is not convenient.
  • This type of visualization doesn't display the effect size of the association. The strength of the association is distinct from the effect size. Meaning, that an association can be strong but the effect of the SNP on the disease can be small, or vice versa. In other words, we can be very confident that the SNP has an effect on a specific disease but knowing that this effect is small.

This project aims to create a visualization tool that allows the visualization of multiple PheWAS (e.g. multiple SNPs) at the same time. This tool also displays the size of effect along with the strength of association.

PheWAS Schema

The different views

The size of effect map

Each cell represents the effect of one SNP mutation on one phenotype. If the color is red, the risk is increased and if the color is blue the risk is decreased click. The opacity varies depending on the size of the effect. The size of effect map allows to see all the tested SNPs in columns (with a limit of 50 SNPs for now) and all the significant phenotypes in rows. This view allows a rapid overview of all the SNPs and all the phenotypes at once. Rows and columns can be ordered by alphabetical order or by similarity (hierarchical clustering).

The in-line forest-plot

When a row or a cell of the size of effect map is activated, an inline forest plot appears, revealing the details of the size of effects for the selected row. This view allows to see the effects sizes for one phenotype across all SNPs at once. The effect size is computed as an odds-ratio between the 2 groups of patients. The confidence-intervals of each effect size are also displayed to have an idea of the precision of the estimated effect.

The strength of association map

This view shows all the phenotypes for a given SNP. It is based on the classical manhattan plot but flipped horizontally. Each circle represents a phenotype. The horizontal position reflects the strength of the association between the selected SNP and this clinical feature. Every feature on the right side of the dashed line is statistically associated to the selected SNP (after Bonferroni correction for multiple testing).

The co-association network

Each node represents a phenotype (resp. a SNP). The color represents the category of the node (clinical category or gene). A link is drawn between two nodes if they are associated in the same analysis. For example, two phenotypes associated to the same SNP will share a link. The weight of the link depends on the number of co-association in the different analyses.

Data

Data source

The PheWAS catalog: Denny JC, Bastarache L, Ritchie MD et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10.
Available from: http://phewas.mc.vanderbilt.edu/

Pre-processing

Software

Data pre-processing was realized using R statistical software version 3.1.2. The following packages were used:

  • vegan_2.2-1 (vegdist for the binomial distance)
  • reshape2_1.4.1 (data wrangling)
  • PheWAS_0.9.6 (for the phenotype categories)
  • rjson_0.2.15 (for the export in json)

Method

We selected the top 35 SNPs (e.g. the SNPs with the highest number of significant associations). We computed the similarity distance between the features using the binomial distance. Then, we performed a hierarchical clustering to reorder the SNPs and phenotypes.
The Odds-ratio were not reported with their confidence intervals. So we estimated the confidence intervals based on the assumption of a gaussian distribution.
For the network view, a co-association was defined as follow: 2 phenotypes were linked if they had a significant (after correction for multiple testing) association with the same SNP. The number of SNPs with a co-association defined the weight of the link.

Link to the process book (process_book.pdf)