A phenome-wide association study is an association study between phenotypes (e.g. disease, clinical feature,...) and genotypes.In this example, phenotypes are represented as groups of diagnosis codes from the International Classification of diseases (ICD).And the genotypes are represented Single Nucleotide Polymorphisms (SNPs) which are locations in the genome where mutations of a single DNA base are frequent. A PheWAS consists in selecting a group of patients with a specific allele of a SNP and a group of control patients with the other allele and testing each phenotype to find systematic associations between one SNP and all the phenotypes.
The usual way of viewing the results of a PheWAS is a Manhattan plot (See figure). Each point represents a phenotype, spread horizontally depending on the category of the phenotype (e.g. cardio-vascular, pulmonary,...) The vertical axis is dedicated to the strength of the association between the data point and the SNP analyzed, the higher the point is, the more significant the association is.
This kind of visualization has 2 major limitations:
This project aims to create a visualization tool that allows the visualization of multiple PheWAS (e.g. multiple SNPs) at the same time. This tool also displays the size of effect along with the strength of association.
Each cell represents the effect of one SNP mutation on one phenotype. If the color is red, the risk is increased and if the color is blue the risk is decreased click. The opacity varies depending on the size of the effect. The size of effect map allows to see all the tested SNPs in columns (with a limit of 50 SNPs for now) and all the significant phenotypes in rows. This view allows a rapid overview of all the SNPs and all the phenotypes at once. Rows and columns can be ordered by alphabetical order or by similarity (hierarchical clustering).
When a row or a cell of the size of effect map is activated, an inline forest plot appears, revealing the details of the size of effects for the selected row. This view allows to see the effects sizes for one phenotype across all SNPs at once. The effect size is computed as an odds-ratio between the 2 groups of patients. The confidence-intervals of each effect size are also displayed to have an idea of the precision of the estimated effect.
This view shows all the phenotypes for a given SNP. It is based on the classical manhattan plot but flipped horizontally. Each circle represents a phenotype. The horizontal position reflects the strength of the association between the selected SNP and this clinical feature. Every feature on the right side of the dashed line is statistically associated to the selected SNP (after Bonferroni correction for multiple testing).
Each node represents a phenotype (resp. a SNP). The color represents the category of the node (clinical category or gene). A link is drawn between two nodes if they are associated in the same analysis. For example, two phenotypes associated to the same SNP will share a link. The weight of the link depends on the number of co-association in the different analyses.
The PheWAS catalog: Denny JC, Bastarache L, Ritchie MD et al. Systematic comparison of phenome-wide
association study of electronic medical record data and genome-wide
association study data. Nat Biotechnol. 2013 Dec;31(12):1102-10.
Available from: http://phewas.mc.vanderbilt.edu/
Data pre-processing was realized using R statistical software version 3.1.2. The following packages were used:
We selected the top 35 SNPs (e.g. the SNPs with the highest number of significant associations). We computed
the similarity distance between the features using the binomial distance. Then, we performed a hierarchical clustering
to reorder the SNPs and phenotypes.
The Odds-ratio were not reported with their confidence intervals. So we estimated the confidence intervals based on the assumption of a gaussian distribution.
For the network view, a co-association was defined as follow: 2 phenotypes were linked if they had a significant (after correction for multiple testing) association with the same SNP. The number of SNPs with a co-association defined the weight of the link.