Abstract
Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease caused by degeneration of both upper and lower motor neurons, with death from respiratory failure typically occurring 2-3 years after onset. More than 40 genes are definitively linked to ALS with a further 125 involved in either pathogenic processes or as modifiers of phenotype. However, the complex architecture of ALS has impacted negatively on further genetic variant discovery and the interaction of known and non-canonical variants with transcriptomic factors, such as alterations in gene expression. Despite identifying a number of pertinent risk loci in recent years, genome-wide association studies (GWAS) are plagued by issues. Single nucleotide polymorphism (SNP)-based microarray genotyping approaches can only explain 8.5-20% of the estimated 60% heritability of ALS as only common variants are tagged, leaving the remaining ~40% explicable by rare and low frequency SNPs, other classes of variation such as structural variants, transposable elements, post-transcriptional and splicing alterations, and changes in gene expression. Consequently, more research groups and dedicated ALS consortia such as Project MinE and TargetALS, are opting for next-generation sequencing (NGS) technologies such as whole genome sequencing (WGS) and RNA-sequencing (RNAseq), as they are capable of detecting variation missed by microarray technologies and therefore can be used to better understand the complexity of ALS. NGS can be utilised for precision medicine approaches, which involve harnessing big data and bioinformatics or machine learning frameworks to identify biologically homogeneous groups of individuals. It is therefore important to adopt NGS-based precision medicine strategies in ALS research to define people with ALS into biologically relevant subgroups to better inform biomarker discovery or future clinical trial design.The overall focus of this thesis was to employ several bioinformatics and machine learning techniques which can exploit the potentialities of WGS and RNAseq to delineate the complex heterogeneity of ALS and provide new molecular insights into disease for the advancement of precision medicine. Chapter 4 reports the update of a previously published NGS bioinformatics pipeline developed by our group, which demonstrates enhanced detection, annotation and prioritisation of structural variants, including transposable elements and tandem repeats. In Chapter 5, I confirm that rare missense variants and in-frame deletions in the tail domain of the neurofilament heavy chain gene (NEFH) increase ALS risk, through a meta-analysis of previous reports from the literature, and variant screening and rare burden analysis of SNPs, indels and structural variants from the Project MinE WGS cohort. This is in agreement with previous reports using smaller sample sizes. Causal associations are also found for intronic SNPs and indels in the rod domain, and a protective effect is identified for a large 113 base-pair deletion in the tail domain; both identified in Project MinE with rare variant burden analysis. Despite these findings, these associations do not have large effect sizes and require more experimental follow-up studies to assess their functional impact before determining whether their inclusion in genetic screening panels is warranted. In Chapter 6, I investigate the molecular architecture of ALS by conducting a case-control differential gene expression analysis of post-mortem motor cortex RNAseq data from the KCL BrainBank (a subsidiary of Project MinE) and TargetALS, which identify genes in the neuropeptide signalling pathway as differentially expressed and enriched in both datasets. Analysis of neuropeptide-related genes and their receptors find that higher expression of TACR1 and NPBWR1 is associated with longer disease duration and lower age of onset, respectively, suggesting that neuropeptides could be used as diagnostic and prognostic biomarkers. Finally, in Chapter 7 I identify and characterise three molecular subtypes of ALS which reflect predominant molecular mechanisms of pathogenesis, using unsupervised hierarchical clustering of KCL BrainBank for initial discovery of the clusters, followed by linear discriminant analysis for validation in independent brain and blood datasets. I show that these molecular phenotypes are robust, with their expression signatures being able to distinguish people with ALS from controls, and the motor cortex from occipital cortex and cerebellum regions in the TargetALS cohort. Cell type analysis also reinforces the biological interpretation of the clusters. This demonstrates that these motor-cortex derived molecular phenotypes could be used to successfully stratify people with ALS into biologically relevant subgroups, although further work needs to be carried out to determine whether these subgroups are truly ALS-specific. Distinct cluster-related onset and progression measures in both motor cortex case datasets are also identified, which demonstrates the potential for future identification of subgroup-specific prognostic biomarkers.
Date of Award | 1 Aug 2024 |
---|---|
Original language | English |
Awarding Institution |
|
Supervisor | Ammar Al-Chalabi (Supervisor), Alfredo Iacoangeli (Supervisor) & Ahmad Al Khleifat (Supervisor) |