TY - CHAP
T1 - Transfer Learning of Deep Spatiotemporal Networks to Model Arbitrarily Long Videos of Seizures
AU - Pérez-García, Fernando
AU - Scott, Catherine
AU - Sparks, Rachel
AU - Diehl, Beate
AU - Ourselin, Sébastien
N1 - Funding Information:
This work is supported by the Engineering and Physical Sciences Research Council (EPSRC) [EP/R512400/1]. This work is additionally supported by the EPSRC-funded UCL Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (i4health) [EP/S021930/1] and the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS, UCL) [203145Z/16/Z]. The data acquisition was supported by the National Institute of Neurological Disorders and Stroke [U01-NS090407]. This publication represents, in part, independent research commissioned by the Wellcome Innovator Award [218380/Z/19/Z/]. The views expressed in this publication are those of the authors and not necessarily those of the Wellcome Trust. The weights for the 2D and 3D models were downloaded from TorchVision and https://github.com/moabitcoin/ig65m-pytorch, respectively.
Funding Information:
Acknowledgments. This work is supported by the Engineering and Physical Sciences Research Council (EPSRC) [EP/R512400/1]. This work is additionally supported by the EPSRC-funded UCL Centre for Doctoral Training in Intelligent, Integrated Imaging in Healthcare (i4health) [EP/S021930/1] and the Wellcome/EPSRC Centre for Interventional and Surgical Sciences (WEISS, UCL) [203145Z/16/Z]. The data acquisition was supported by the National Institute of Neurological Disorders and Stroke [U01-NS090407].
Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Detailed analysis of seizure semiology, the symptoms and signs which occur during a seizure, is critical for management of epilepsy patients. Inter-rater reliability using qualitative visual analysis is often poor for semiological features. Therefore, automatic and quantitative analysis of video-recorded seizures is needed for objective assessment. We present GESTURES, a novel architecture combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to learn deep representations of arbitrarily long videos of epileptic seizures. We use a spatiotemporal CNN (STCNN) pre-trained on large human action recognition (HAR) datasets to extract features from short snippets (≈ 0.5 s) sampled from seizure videos. We then train an RNN to learn seizure-level representations from the sequence of features. We curated a dataset of seizure videos from 68 patients and evaluated GESTURES on its ability to classify seizures into focal onset seizures (FOSs) (N= 106 ) vs. focal to bilateral tonic-clonic seizures (TCSs) (N= 77 ), obtaining an accuracy of 98.9% using bidirectional long short-term memory (BLSTM) units. We demonstrate that an STCNN trained on a HAR dataset can be used in combination with an RNN to accurately represent arbitrarily long videos of seizures. GESTURES can provide accurate seizure classification by modeling sequences of semiologies. The code, models and features dataset are available at https://github.com/fepegar/gestures-miccai-2021.
AB - Detailed analysis of seizure semiology, the symptoms and signs which occur during a seizure, is critical for management of epilepsy patients. Inter-rater reliability using qualitative visual analysis is often poor for semiological features. Therefore, automatic and quantitative analysis of video-recorded seizures is needed for objective assessment. We present GESTURES, a novel architecture combining convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to learn deep representations of arbitrarily long videos of epileptic seizures. We use a spatiotemporal CNN (STCNN) pre-trained on large human action recognition (HAR) datasets to extract features from short snippets (≈ 0.5 s) sampled from seizure videos. We then train an RNN to learn seizure-level representations from the sequence of features. We curated a dataset of seizure videos from 68 patients and evaluated GESTURES on its ability to classify seizures into focal onset seizures (FOSs) (N= 106 ) vs. focal to bilateral tonic-clonic seizures (TCSs) (N= 77 ), obtaining an accuracy of 98.9% using bidirectional long short-term memory (BLSTM) units. We demonstrate that an STCNN trained on a HAR dataset can be used in combination with an RNN to accurately represent arbitrarily long videos of seizures. GESTURES can provide accurate seizure classification by modeling sequences of semiologies. The code, models and features dataset are available at https://github.com/fepegar/gestures-miccai-2021.
KW - Epilepsy video-telemetry
KW - Temporal segment networks
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=85116450010&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-87240-3_32
DO - 10.1007/978-3-030-87240-3_32
M3 - Conference paper
AN - SCOPUS:85116450010
SN - 9783030872397
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 334
EP - 344
BT - Medical Image Computing and Computer Assisted Intervention – MICCAI 2021 - 24th International Conference, Proceedings
A2 - de Bruijne, Marleen
A2 - Cattin, Philippe C.
A2 - Cotin, Stéphane
A2 - Padoy, Nicolas
A2 - Speidel, Stefanie
A2 - Zheng, Yefeng
A2 - Essert, Caroline
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2021
Y2 - 27 September 2021 through 1 October 2021
ER -