Methods and Applications for Summarising Free-Text Narratives in Electronic Health Records

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

As medical services move towards electronic health record (EHR) systems the breadth and depth of data stored at each patient encounter has increased. This growing wealth of data and investment in care systems has arguably put greater strain on services, as those at the forefront are pushed towards greater time spent in front of computers over their patients. To minimise the use of EHR systems clinicians often revert to using free-text data entry to circumvent the structured input fields. It has been estimated that approximately 80% of EHR data is within the free-text portion. Outside of their primary use, that is facilitating the direct care of the patient, secondary use of EHR data includes clinical research, clinical audits, service improvement research, population health analysis, disease and patient phenotyping, clinical trial recruitment to name but a few.

This thesis presents a number of projects, previously published and original work in the development, assessment and application of summarisation methods for EHR free-text. Firstly, I introduce, define and motivate EHR free-text analysis and summarisation methods of open-domain text and how this compares to EHR free-text. I then introduce a subproblem in natural language processing (NLP) that is the recognition of named entities and linking of the entities to pre-existing clinical knowledge bases (NER+L). This leads to the first novel contribution the Medical Concept Annotation Toolkit (MedCAT) that provides a software library workflow for clinical NER+L problems. I frame the outputs of MedCAT as a form of summarisation by showing the tools contributing to published clinical research and the application of this to another clinical summarisation use-case ‘clinical coding’. I then consider methods for the textual summarisation of portions of clinical free-text. I show how redundancy in clinical text is empirically different to open-domain text discussing how this impacts text-to-text summarisation. I then compare methods to generate discharge summary sections from previous clinical notes using methods presented in prior chapters via a novel ‘guidance’ approach.

I close the thesis by discussing my contributions in the context of state-of-the-art and how my work fits into the wider body of clinical NLP research. I briefly describe the challenges encountered throughout, offer my perspectives on the key enablers of clinical informatics research, and finally the potential future work that will go towards translating research impact to real-world benefits to healthcare systems, workers and patients alike.
Date of Award1 Dec 2023
Original languageEnglish
Awarding Institution
  • King's College London
SupervisorZina Ibrahim (Supervisor) & Richard Dobson (Supervisor)

Cite this

'