Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Harry Coppock; George Nicholson; Ivan Kiskin; Vasiliki Koutra; Kieran Baker; Jobie Budd; Richard Payne; Emma Karoune; David Hurley; Alexander Titcomb; Sabrina Egglestone; Ana Tendero Cañadas; Lorraine Butler; Radka Jersakova; Jonathon Mellor; Selina Patel; Tracey Thornley; Peter Diggle; Sylvia Richardson; Josef Packham; Björn W. Schuller; Davide Pigoli; Steven Gilmour; Stephen Roberts; Chris Holmes

doi:10.1038/s42256-023-00773-8

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Harry Coppock, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Kieran Baker, Jobie Budd, Richard Payne, Emma Karoune, David Hurley, Alexander Titcomb, Sabrina Egglestone, Ana Tendero Cañadas, Lorraine Butler, Radka Jersakova, Jonathon Mellor, Selina Patel, Tracey Thornley, Peter Diggle, Sylvia Richardson, Josef PackhamBjörn W. Schuller, Davide Pigoli, Steven Gilmour, Stephen Roberts, Chris Holmes^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

3 Citations (Scopus)

Abstract

Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. However, it has not yet been determined whether such model performance is driven by latent audio biomarkers with true causal links to SARS-CoV-2 infection or by confounding effects, such as recruitment bias, present in observational studies. Here we undertake a large-scale study of audio-based AI classifiers as part of the UK government’s pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive polymerase chain reaction tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC–AUC = 0.846 [0.838–0.854]). However, after matching on measured confounders, such as self-reported symptoms, performance is much weaker (ROC–AUC = 0.619 [0.594–0.644]). Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions on the basis of user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics.

Original language	English
Pages (from-to)	229-242
Number of pages	14
Journal	Nature Machine Intelligence
Volume	6
Issue number	2
DOIs	https://doi.org/10.1038/s42256-023-00773-8
Publication status	Published - Feb 2024

Access to Document

10.1038/s42256-023-00773-8

Cite this

Coppock, H., Nicholson, G., Kiskin, I., Koutra, V., Baker, K., Budd, J., Payne, R., Karoune, E., Hurley, D., Titcomb, A., Egglestone, S., Tendero Cañadas, A., Butler, L., Jersakova, R., Mellor, J., Patel, S., Thornley, T., Diggle, P., Richardson, S., ... Holmes, C. (2024). Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers. Nature Machine Intelligence, 6(2), 229-242. https://doi.org/10.1038/s42256-023-00773-8

@article{99d937ba3f70428abd4bc15097800d9b,

title = "Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers",

abstract = "Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. However, it has not yet been determined whether such model performance is driven by latent audio biomarkers with true causal links to SARS-CoV-2 infection or by confounding effects, such as recruitment bias, present in observational studies. Here we undertake a large-scale study of audio-based AI classifiers as part of the UK government{\textquoteright}s pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive polymerase chain reaction tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC–AUC = 0.846 [0.838–0.854]). However, after matching on measured confounders, such as self-reported symptoms, performance is much weaker (ROC–AUC = 0.619 [0.594–0.644]). Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions on the basis of user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics.",

author = "Harry Coppock and George Nicholson and Ivan Kiskin and Vasiliki Koutra and Kieran Baker and Jobie Budd and Richard Payne and Emma Karoune and David Hurley and Alexander Titcomb and Sabrina Egglestone and {Tendero Ca{\~n}adas}, Ana and Lorraine Butler and Radka Jersakova and Jonathon Mellor and Selina Patel and Tracey Thornley and Peter Diggle and Sylvia Richardson and Josef Packham and Schuller, {Bj{\"o}rn W.} and Davide Pigoli and Steven Gilmour and Stephen Roberts and Chris Holmes",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = feb,

doi = "10.1038/s42256-023-00773-8",

language = "English",

volume = "6",

pages = "229--242",

journal = "Nature Machine Intelligence",

issn = "2522-5839",

publisher = "Springer Nature Switzerland AG",

number = "2",

}

Coppock, H, Nicholson, G, Kiskin, I, Koutra, V, Baker, K, Budd, J, Payne, R, Karoune, E, Hurley, D, Titcomb, A, Egglestone, S, Tendero Cañadas, A, Butler, L, Jersakova, R, Mellor, J, Patel, S, Thornley, T, Diggle, P, Richardson, S, Packham, J, Schuller, BW, Pigoli, D , Gilmour, S, Roberts, S & Holmes, C 2024, 'Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers', Nature Machine Intelligence, vol. 6, no. 2, pp. 229-242. https://doi.org/10.1038/s42256-023-00773-8

TY - JOUR

T1 - Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

AU - Coppock, Harry

AU - Nicholson, George

AU - Kiskin, Ivan

AU - Koutra, Vasiliki

AU - Baker, Kieran

AU - Budd, Jobie

AU - Payne, Richard

AU - Karoune, Emma

AU - Hurley, David

AU - Titcomb, Alexander

AU - Egglestone, Sabrina

AU - Tendero Cañadas, Ana

AU - Butler, Lorraine

AU - Jersakova, Radka

AU - Mellor, Jonathon

AU - Patel, Selina

AU - Thornley, Tracey

AU - Diggle, Peter

AU - Richardson, Sylvia

AU - Packham, Josef

AU - Schuller, Björn W.

AU - Pigoli, Davide

AU - Gilmour, Steven

AU - Roberts, Stephen

AU - Holmes, Chris

PY - 2024/2

Y1 - 2024/2

N2 - Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. However, it has not yet been determined whether such model performance is driven by latent audio biomarkers with true causal links to SARS-CoV-2 infection or by confounding effects, such as recruitment bias, present in observational studies. Here we undertake a large-scale study of audio-based AI classifiers as part of the UK government’s pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive polymerase chain reaction tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC–AUC = 0.846 [0.838–0.854]). However, after matching on measured confounders, such as self-reported symptoms, performance is much weaker (ROC–AUC = 0.619 [0.594–0.644]). Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions on the basis of user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics.

AB - Recent work has reported that respiratory audio-trained AI classifiers can accurately predict SARS-CoV-2 infection status. However, it has not yet been determined whether such model performance is driven by latent audio biomarkers with true causal links to SARS-CoV-2 infection or by confounding effects, such as recruitment bias, present in observational studies. Here we undertake a large-scale study of audio-based AI classifiers as part of the UK government’s pandemic response. We collect a dataset of audio recordings from 67,842 individuals, with linked metadata, of whom 23,514 had positive polymerase chain reaction tests for SARS-CoV-2. In an unadjusted analysis, similar to that in previous works, AI classifiers predict SARS-CoV-2 infection status with high accuracy (ROC–AUC = 0.846 [0.838–0.854]). However, after matching on measured confounders, such as self-reported symptoms, performance is much weaker (ROC–AUC = 0.619 [0.594–0.644]). Upon quantifying the utility of audio-based classifiers in practical settings, we find them to be outperformed by predictions on the basis of user-reported symptoms. We make best-practice recommendations for handling recruitment bias, and for assessing audio-based classifiers by their utility in relevant practical settings. Our work provides insights into the value of AI audio analysis and the importance of study design and treatment of confounders in AI-enabled diagnostics.

UR - http://www.scopus.com/inward/record.url?scp=85184421962&partnerID=8YFLogxK

U2 - 10.1038/s42256-023-00773-8

DO - 10.1038/s42256-023-00773-8

M3 - Article

AN - SCOPUS:85184421962

SN - 2522-5839

VL - 6

SP - 229

EP - 242

JO - Nature Machine Intelligence

JF - Nature Machine Intelligence

IS - 2

ER -

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

Abstract

Access to Document

Other files and links

Fingerprint

Cite this