Characterisation and differentiation of five UK populations using massively parallel sequencing of forensic STRs

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

The transition from capillary electrophoresis (CE) to massively parallel sequencing (MPS) in forensics presents an opportunity to review the choice of genetic markers used for identification and assess the ways in which we utilise them. In relation to short tandem repeat (STR) analysis, the move to assign alleles using sequence rather than length-based methodologies has highlighted the extent to which previous allelic variation was masked. In this work, 1000 samples from five UK-representative populations (White British, West African, North East African, East Asian and South Asian) were typed using the ForenSeq™ DNA Signature Prep kit and MiSeq FGx™ Forensic Genomics System. This thesis addresses some of the key questions associated with the characterisation of novel sequence variants, such as back-compatibility with CE results, power of discrimination and nomenclature. A concordance rate of over 99% was obtained when comparing results of the ForenSeq DNA Signature Prep kit with CE, making it highly compatible with current DNA databases. The increased power of discrimination when taking sequence-level variation into account was substantial, with an overall random match probability for the loci studied that was over 750 times lower than with length-based data alone. The added value of analysing flanking regions of STRs was found to be limited, although their inclusion in analysis is vital for accurate allele calling.

The data from this PhD contributed 214 novel sequences to a larger project cataloguing autosomal STR variation. The large number of variants characterised at select markers brings into question the strategies for producing representative population data, yet also provides an opportunity to use this diversity in unique ways. The presence of population-specific sequence variation in particular raises the prospect of using STR profiles for population identification, both on their own and in combination with ancestry-informative single nucleotide polymorphisms (SNPs). STRs have largely been discounted for geographic ancestry determination due to their high mutation rate, which in turn makes them well suited for individual identification. Being able to obtain a DNA profile that can simultaneously be used for geographical ancestry estimation and searching against offender databases would be a huge benefit to the field of forensic identification in terms of time, cost, and sample availability. Across the five populations studied, good differentiation was achieved using sequenced STR profiles – results which also showed a clear improvement over length-based data.
Date of Award1 Jun 2022
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorDavid Ballard (Supervisor) & Denise Syndercombe-Court (Supervisor)

Cite this

'