Powerful large-scale GWAS not only discover loci associated with various traits but also provide data for performing downstream analyses that lead to further aetiological insights. One such group of downstream analyses involve the use of polygenic risk scores. A polygenic risk score (PRS) is a weighted sum of risk alleles across the genome, which acts as a proxy of an individual’s genetic propensity for a phenotype. The weight is usually the effect size estimated from a GWAS on the phenotype. PRS were first used to test whether an outcome had a polygenic basis, especially when the corresponding GWAS yielded no significant results. More recently, PRS have been used for a huge range of applications, most commonly so far to evaluate evidence for shared genetic aetiology between different phenotypes. In this thesis, I evaluate the power of various PRS analyses by exploiting UK Biobank data, I develop two novel shrinkage methods for increasing the power of PRS analyses, which may have applications beyond GWAS and PRS studies, and finally I develop a set of methods for extending the PRS approach to individual-level gene-set analyses. My PhD begins by developing a method that we call “Permutation Shrinkage”, which shrinks GWAS effect size estimates in order to make them closer to the true effect sizes. The motivation of this method is to improve the PRS prediction model, which is based on GWAS effect size estimates. The accuracy of effect size estimates greatly affects the power of the prediction model based on them. This shrinkage method estimates ‘noise’ in the observed effect size estimates from a null distribution of the effect sizes generated by permuting raw phenotype data and then subtracting these estimated null effects from the observed estimates. Permutation shrinkage was tested in UK Biobank data. The corrected GWAS leads to an average 35% increase in PRS R2 across a range of traits tested. In the next chapter, I extend the method to an order statistic method (“Order Statistics Shrinkage”) applicable for use on summary statistic data, which is an important extension because most available GWAS data are on summary statistics only. I compare this new shrinkage method to several other well-established shrinkage methods, such as Ridge and LASSO regression and tailed the new method to GWAS data. Order Statistics Shrinkage had similar performance with Permutation Shrinkage in the tests. In the final work chapter of my thesis, I extend the conventional PRS analysis method to a group of gene-set analysis methods, which we collectively call 'PRSet'. We add PRSet to the PRSice suite of software packages. PRSet calculates gene-set PRS to study aetiology on the gene-set or pathway level. Gene-set analyses can be either self-contained (testing general association) or competitive (testing enrichment compared to other gene-sets). The performance of PRSet is compared with MAGMA, a leading gene-set analysis method.
Developing methods to improve and broaden polygenic risk score analyses
Ruan, Y. (Author). 1 Jun 2019
Student thesis: Doctoral Thesis › Doctor of Philosophy