IMPROVED ALLELE SPECIFIC EXPRESSION (ASE) DETECTION AND THE IMPLICATIONS FOR UNDERSTANDING REGULATORY AND DISEASE GENETICS

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

Gene expression plays a crucial role in phenotypic changes and disease. The regulation of gene expression is a complex process that involves genetics, environmental signals, epigenetics, and proteins. However, studying the interplay of these processes is key to better understanding the role of genetic variation and gene expression changes within important biological processes.

In genomic studies, gene expression changes are often studied in population-level data, however these analyses are often limited to studying the impacts of common variants on gene expression due to the limited power associated with rare variants in the populations under study. This may cause a problem, particular in small datasets focussed on rare disease, or when trying to understand the full range of genetic features that may modulate transcriptional events. Allele specific expression (ASE) offers an avenue to overcome these issues and consider the regulation of gene expression levels in smaller sample sizes, potentially capturing the impact of rare variants. ASE can also be used in combination with other population-based genetic studies to improve the overall signal.

However, ASE analysis suffers from a series of computational biases associated with short-read RNA-seq data and are particularly sensitive to sequence alignment errors driven by reads that overlap heterozygous variants. In this thesis, I have developed a Personalised ASE Caller (PAC) pipeline that improves heterozygous read alignment and reduces biases when quantifying allelic ratios. I have developed the pipeline into a streamlined tool using Nextflow and Docker technology and have made this tool available on my GitHub page for use by the scientific community.

I validated the performance of PAC against other commonly used methods showing that it significantly improves allelic quantification. I then show that PAC can identify ground truth signals in simulated data and can recapitulate population level signals better than other methods. I also demonstrate that PAC has utility in a disease context and that better allelic quantification has downstream consequences for interpreting biological data.

Date of Award1 May 2023
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorTim Hubbard (Supervisor) & Alan Hodgkinson (Supervisor)

Cite this

'