Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images

Robert O'Shea; Thubeena Manickavasagar; Carolyn Horst; Daniel Hughes; James Cusack; Sophia Tsoka; Gary Cook; Vicky Goh

doi:10.1186/s13244-023-01542-2

Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images

Robert O'Shea, Thubeena Manickavasagar, Carolyn Horst, Daniel Hughes, James Cusack, Sophia Tsoka, Gary Cook, Vicky Goh

GSTT Guy's and St Thomas' NHS Foundation Trust

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

Abstract

Purpose: Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels (“image contains object” or “image does not contain object”), presenting a different approach towards explainable object detectors for radiological imaging tasks. Methods: A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet’s voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions. Results: Despite the absence of voxel-level labels in training, WSUnet’s voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76–0.80]; dice: 0.43, 95% CI: [0.39–0.46]), and external testing (precision: 0.78, 95% CI: [0.76–0.81]; dice: 0.33, 95% CI: [0.32–0.35]). WSUnet’s voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49–0.56] vs. 0.23, 95% CI: [0.21–0.25]) and testing (AUPR: 0.40, 95% CI: [0.38–0.41] vs. 0.36, 95% CI: [0.34–0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68–0.77]). Conclusion: Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging. Critical relevance statement: WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet’s voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability. Key points: • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level. Graphical Abstract: [Figure not available: see fulltext.].

Original language	English
Article number	195
Journal	Insights into imaging
Volume	14
Issue number	1
DOIs	https://doi.org/10.1186/s13244-023-01542-2
Publication status	Published - 19 Nov 2023

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1186/s13244-023-01542-2Licence: CC BY

Cite this

@article{a78148603b394899843c62218e1365cc,

title = "Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images",

abstract = "Purpose: Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels (“image contains object” or “image does not contain object”), presenting a different approach towards explainable object detectors for radiological imaging tasks. Methods: A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet{\textquoteright}s voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions. Results: Despite the absence of voxel-level labels in training, WSUnet{\textquoteright}s voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76–0.80]; dice: 0.43, 95% CI: [0.39–0.46]), and external testing (precision: 0.78, 95% CI: [0.76–0.81]; dice: 0.33, 95% CI: [0.32–0.35]). WSUnet{\textquoteright}s voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49–0.56] vs. 0.23, 95% CI: [0.21–0.25]) and testing (AUPR: 0.40, 95% CI: [0.38–0.41] vs. 0.36, 95% CI: [0.34–0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68–0.77]). Conclusion: Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging. Critical relevance statement: WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet{\textquoteright}s voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability. Key points: • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level. Graphical Abstract: [Figure not available: see fulltext.].",

author = "Robert O'Shea and Thubeena Manickavasagar and Carolyn Horst and Daniel Hughes and James Cusack and Sophia Tsoka and Gary Cook and Vicky Goh",

note = "Funding Information: Authors acknowledge funding support from the UK Research & Innovation London Medical Imaging and Artificial Intelligence Centre; Wellcome/Engineering and Physical Sciences Research Council Centre for Medical Engineering at King{\textquoteright}s College London [WT 203148/Z/16/Z]; National Institute for Health Research Biomedical Research Centre at Guy{\textquoteright}s & St Thomas{\textquoteright} Hospitals and King{\textquoteright}s College London; National Institute for Health Research Biomedical Research Centre at Guy{\textquoteright}s & St Thomas{\textquoteright} Hospitals and King{\textquoteright}s College London; Cancer Research UK National Cancer Imaging Translational Accelerator [C1519/A28682]. For the purpose of open access, authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. Publisher Copyright: {\textcopyright} 2023, The Author(s).",

year = "2023",

month = nov,

day = "19",

doi = "10.1186/s13244-023-01542-2",

language = "English",

volume = "14",

journal = "Insights into imaging",

issn = "1869-4101",

publisher = "Springer Science and Business Media Deutschland GmbH",

number = "1",

}

TY - JOUR

T1 - Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images

AU - O'Shea, Robert

AU - Manickavasagar, Thubeena

AU - Horst, Carolyn

AU - Hughes, Daniel

AU - Cusack, James

AU - Tsoka, Sophia

AU - Cook, Gary

AU - Goh, Vicky

N1 - Funding Information: Authors acknowledge funding support from the UK Research & Innovation London Medical Imaging and Artificial Intelligence Centre; Wellcome/Engineering and Physical Sciences Research Council Centre for Medical Engineering at King’s College London [WT 203148/Z/16/Z]; National Institute for Health Research Biomedical Research Centre at Guy’s & St Thomas’ Hospitals and King’s College London; National Institute for Health Research Biomedical Research Centre at Guy’s & St Thomas’ Hospitals and King’s College London; Cancer Research UK National Cancer Imaging Translational Accelerator [C1519/A28682]. For the purpose of open access, authors have applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission. Publisher Copyright: © 2023, The Author(s).

PY - 2023/11/19

Y1 - 2023/11/19

N2 - Purpose: Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels (“image contains object” or “image does not contain object”), presenting a different approach towards explainable object detectors for radiological imaging tasks. Methods: A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet’s voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions. Results: Despite the absence of voxel-level labels in training, WSUnet’s voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76–0.80]; dice: 0.43, 95% CI: [0.39–0.46]), and external testing (precision: 0.78, 95% CI: [0.76–0.81]; dice: 0.33, 95% CI: [0.32–0.35]). WSUnet’s voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49–0.56] vs. 0.23, 95% CI: [0.21–0.25]) and testing (AUPR: 0.40, 95% CI: [0.38–0.41] vs. 0.36, 95% CI: [0.34–0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68–0.77]). Conclusion: Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging. Critical relevance statement: WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet’s voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability. Key points: • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level. Graphical Abstract: [Figure not available: see fulltext.].

AB - Purpose: Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels (“image contains object” or “image does not contain object”), presenting a different approach towards explainable object detectors for radiological imaging tasks. Methods: A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet’s voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions. Results: Despite the absence of voxel-level labels in training, WSUnet’s voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76–0.80]; dice: 0.43, 95% CI: [0.39–0.46]), and external testing (precision: 0.78, 95% CI: [0.76–0.81]; dice: 0.33, 95% CI: [0.32–0.35]). WSUnet’s voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49–0.56] vs. 0.23, 95% CI: [0.21–0.25]) and testing (AUPR: 0.40, 95% CI: [0.38–0.41] vs. 0.36, 95% CI: [0.34–0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68–0.77]). Conclusion: Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging. Critical relevance statement: WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet’s voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability. Key points: • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level. Graphical Abstract: [Figure not available: see fulltext.].

UR - http://www.scopus.com/inward/record.url?scp=85177082243&partnerID=8YFLogxK

U2 - 10.1186/s13244-023-01542-2

DO - 10.1186/s13244-023-01542-2

M3 - Article

SN - 1869-4101

VL - 14

JO - Insights into imaging

JF - Insights into imaging

IS - 1

M1 - 195

ER -

Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images

Abstract

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this