Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features

Usman Bashir; Bhavin Kawa; Muhammad Siddique; Sze Mun Mak; Arjun Nair; Emma Mclean; Andrea Bille; Vicky Goh; Gary Cook

doi:10.1259/bjr.20190159

Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features

Usman Bashir, Bhavin Kawa, Muhammad Siddique, Sze Mun Mak, Arjun Nair, Emma Mclean, Andrea Bille, Vicky Goh, Gary Cook

Research output: Contribution to journal › Article › peer-review

36 Citations (Scopus)

187 Downloads (Pure)

Abstract

Objective: Non-invasive distinction between squamous cell carcinoma and adenocarcinoma subtypes of non-small-cell lung cancer (NSCLC) may be beneficial to patients unfit for invasive diagnostic procedures or when tissue is insufficient for diagnosis. The purpose of our study was to compare the performance of random forest algorithms utilizing CT radiomics and/or semantic features in classifying NSCLC. Methods: Two thoracic radiologists scored 11 semantic features on CT scans of 106 patients with NSCLC. A set of 115 radiomics features was extracted from the CT scans. Random forest models were developed from semantic (RM-sem), radiomics (RM-rad), and all features combined (RM-all). External validation of models was performed using an independent test data set (n = 100) of CT scans. Model performance was measured with out-of-bag error and area under curve (AUC), and compared using receiver-operating characteristics curve analysis on the test data set. Results: The median (interquartile-range) error rates of the models were: RF-sem 24.5 % (22.6 - 37.5 %), RF-rad 35.8 % (34.9 - 38.7 %), and RM-all 37.7 % (37.7 - 37.7). On training data, both RF-rad and RF-all gave perfect discrimination (AUC = 1), which was significantly higher than that achieved by RF-sem (AUC = 0.78; p < 0.0001). On test data, however, RM-sem model (AUC = 0.82) out-performed RM-rad and RM-all (AUC = 0.5 and AUC = 0.56; p < 0.0001), neither of which was significantly different from random guess ( p = 0.9 and 0.6 respectively). Conclusion: Non-invasive classification of NSCLC can be done accurately using random forest classification models based on well-known CT-derived descriptive features. However, radiomics-based classification models performed poorly in this scenario when tested on independent data and should be used with caution, due to their possible lack of generalizability to new data. Advances in knowledge: Our study describes novel CT-derived random forest models based on radiologist- interpretation of CT scans (semantic features) that can assist NSCLC classification when histopathology is equivocal or when histopathological sampling is not possible. It also shows that random forest models based on semantic features may be more useful than those built from computational radiomic features.

Original language	English
Article number	20190159
Pages (from-to)	20190159
Journal	British Journal of Radiology
Volume	92
Issue number	1099
Early online date	5 Jun 2019
DOIs	https://doi.org/10.1259/bjr.20190159 https://doi.org/10.1259/bjr.20190159
Publication status	E-pub ahead of print - 5 Jun 2019

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1259/bjr.20190159
10.1259/bjr.20190159Licence: CC BY

Non-invasive classification_BASHIR_Submitted23March0219_SMSubmitted manuscript, 321 KB
Non-invasive classification of_BASHIR_Accepted4April2019Publishedonline5June2019_GOLD VoR (CC BY)Final published version, 710 KBLicence: CC BY

Cite this

Bashir, U., Kawa, B., Siddique, M., Mak, S. M., Nair, A., Mclean, E., Bille, A., Goh, V., & Cook, G. (2019). Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features. British Journal of Radiology, 92(1099), 20190159. Article 20190159. Advance online publication. https://doi.org/10.1259/bjr.20190159, https://doi.org/10.1259/bjr.20190159

@article{e6815a3fda6b4a53b6003b40f311dddb,

title = "Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features",

abstract = "Objective: Non-invasive distinction between squamous cell carcinoma and adenocarcinoma subtypes of non-small-cell lung cancer (NSCLC) may be beneficial to patients unfit for invasive diagnostic procedures or when tissue is insufficient for diagnosis. The purpose of our study was to compare the performance of random forest algorithms utilizing CT radiomics and/or semantic features in classifying NSCLC. Methods: Two thoracic radiologists scored 11 semantic features on CT scans of 106 patients with NSCLC. A set of 115 radiomics features was extracted from the CT scans. Random forest models were developed from semantic (RM-sem), radiomics (RM-rad), and all features combined (RM-all). External validation of models was performed using an independent test data set (n = 100) of CT scans. Model performance was measured with out-of-bag error and area under curve (AUC), and compared using receiver-operating characteristics curve analysis on the test data set. Results: The median (interquartile-range) error rates of the models were: RF-sem 24.5 % (22.6 - 37.5 %), RF-rad 35.8 % (34.9 - 38.7 %), and RM-all 37.7 % (37.7 - 37.7). On training data, both RF-rad and RF-all gave perfect discrimination (AUC = 1), which was significantly higher than that achieved by RF-sem (AUC = 0.78; p < 0.0001). On test data, however, RM-sem model (AUC = 0.82) out-performed RM-rad and RM-all (AUC = 0.5 and AUC = 0.56; p < 0.0001), neither of which was significantly different from random guess ( p = 0.9 and 0.6 respectively). Conclusion: Non-invasive classification of NSCLC can be done accurately using random forest classification models based on well-known CT-derived descriptive features. However, radiomics-based classification models performed poorly in this scenario when tested on independent data and should be used with caution, due to their possible lack of generalizability to new data. Advances in knowledge: Our study describes novel CT-derived random forest models based on radiologist- interpretation of CT scans (semantic features) that can assist NSCLC classification when histopathology is equivocal or when histopathological sampling is not possible. It also shows that random forest models based on semantic features may be more useful than those built from computational radiomic features.",

author = "Usman Bashir and Bhavin Kawa and Muhammad Siddique and Mak, {Sze Mun} and Arjun Nair and Emma Mclean and Andrea Bille and Vicky Goh and Gary Cook",

year = "2019",

month = jun,

day = "5",

doi = "10.1259/bjr.20190159",

language = "English",

volume = "92",

pages = "20190159",

journal = "British Journal of Radiology",

issn = "0007-1285",

publisher = "British Institute of Radiology",

number = "1099",

}

Bashir, U, Kawa, B, Siddique, M, Mak, SM, Nair, A, Mclean, E, Bille, A, Goh, V & Cook, G 2019, 'Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features', British Journal of Radiology, vol. 92, no. 1099, 20190159, pp. 20190159. https://doi.org/10.1259/bjr.20190159, https://doi.org/10.1259/bjr.20190159

TY - JOUR

T1 - Non-invasive classification of non-small cell lung cancer

T2 - a comparison between random forest models utilising radiomic and semantic features

AU - Bashir, Usman

AU - Kawa, Bhavin

AU - Siddique, Muhammad

AU - Mak, Sze Mun

AU - Nair, Arjun

AU - Mclean, Emma

AU - Bille, Andrea

AU - Goh, Vicky

AU - Cook, Gary

PY - 2019/6/5

Y1 - 2019/6/5

N2 - Objective: Non-invasive distinction between squamous cell carcinoma and adenocarcinoma subtypes of non-small-cell lung cancer (NSCLC) may be beneficial to patients unfit for invasive diagnostic procedures or when tissue is insufficient for diagnosis. The purpose of our study was to compare the performance of random forest algorithms utilizing CT radiomics and/or semantic features in classifying NSCLC. Methods: Two thoracic radiologists scored 11 semantic features on CT scans of 106 patients with NSCLC. A set of 115 radiomics features was extracted from the CT scans. Random forest models were developed from semantic (RM-sem), radiomics (RM-rad), and all features combined (RM-all). External validation of models was performed using an independent test data set (n = 100) of CT scans. Model performance was measured with out-of-bag error and area under curve (AUC), and compared using receiver-operating characteristics curve analysis on the test data set. Results: The median (interquartile-range) error rates of the models were: RF-sem 24.5 % (22.6 - 37.5 %), RF-rad 35.8 % (34.9 - 38.7 %), and RM-all 37.7 % (37.7 - 37.7). On training data, both RF-rad and RF-all gave perfect discrimination (AUC = 1), which was significantly higher than that achieved by RF-sem (AUC = 0.78; p < 0.0001). On test data, however, RM-sem model (AUC = 0.82) out-performed RM-rad and RM-all (AUC = 0.5 and AUC = 0.56; p < 0.0001), neither of which was significantly different from random guess ( p = 0.9 and 0.6 respectively). Conclusion: Non-invasive classification of NSCLC can be done accurately using random forest classification models based on well-known CT-derived descriptive features. However, radiomics-based classification models performed poorly in this scenario when tested on independent data and should be used with caution, due to their possible lack of generalizability to new data. Advances in knowledge: Our study describes novel CT-derived random forest models based on radiologist- interpretation of CT scans (semantic features) that can assist NSCLC classification when histopathology is equivocal or when histopathological sampling is not possible. It also shows that random forest models based on semantic features may be more useful than those built from computational radiomic features.

AB - Objective: Non-invasive distinction between squamous cell carcinoma and adenocarcinoma subtypes of non-small-cell lung cancer (NSCLC) may be beneficial to patients unfit for invasive diagnostic procedures or when tissue is insufficient for diagnosis. The purpose of our study was to compare the performance of random forest algorithms utilizing CT radiomics and/or semantic features in classifying NSCLC. Methods: Two thoracic radiologists scored 11 semantic features on CT scans of 106 patients with NSCLC. A set of 115 radiomics features was extracted from the CT scans. Random forest models were developed from semantic (RM-sem), radiomics (RM-rad), and all features combined (RM-all). External validation of models was performed using an independent test data set (n = 100) of CT scans. Model performance was measured with out-of-bag error and area under curve (AUC), and compared using receiver-operating characteristics curve analysis on the test data set. Results: The median (interquartile-range) error rates of the models were: RF-sem 24.5 % (22.6 - 37.5 %), RF-rad 35.8 % (34.9 - 38.7 %), and RM-all 37.7 % (37.7 - 37.7). On training data, both RF-rad and RF-all gave perfect discrimination (AUC = 1), which was significantly higher than that achieved by RF-sem (AUC = 0.78; p < 0.0001). On test data, however, RM-sem model (AUC = 0.82) out-performed RM-rad and RM-all (AUC = 0.5 and AUC = 0.56; p < 0.0001), neither of which was significantly different from random guess ( p = 0.9 and 0.6 respectively). Conclusion: Non-invasive classification of NSCLC can be done accurately using random forest classification models based on well-known CT-derived descriptive features. However, radiomics-based classification models performed poorly in this scenario when tested on independent data and should be used with caution, due to their possible lack of generalizability to new data. Advances in knowledge: Our study describes novel CT-derived random forest models based on radiologist- interpretation of CT scans (semantic features) that can assist NSCLC classification when histopathology is equivocal or when histopathological sampling is not possible. It also shows that random forest models based on semantic features may be more useful than those built from computational radiomic features.

UR - http://www.scopus.com/inward/record.url?scp=85068489039&partnerID=8YFLogxK

U2 - 10.1259/bjr.20190159

DO - 10.1259/bjr.20190159

M3 - Article

C2 - 31166787

SN - 0007-1285

VL - 92

SP - 20190159

JO - British Journal of Radiology

JF - British Journal of Radiology

IS - 1099

M1 - 20190159

ER -

Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features

Abstract

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this