Abstract
Electronic Health Records (EHR) narratives are a rich source of information, embedding high-resolution information of value to secondary research use. However, because the EHRs are mostly in natural language free-text and highly ambiguity-ridden, many natural language processing algorithms have been devised around them to extract meaningful structured information about clinical entities. The performance of the algorithms however, largely varies depending on the training dataset as well as the e effectiveness of the use of background knowledge to steer the learning process.
In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-de ned clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Uni- ed Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).
In this paper we study the impact of initializing the training of a neural network natural language processing algorithm with pre-de ned clinical word embeddings to improve feature extraction and relationship classification between entities. We add our embedding framework to a bi-directional long short-term memory (Bi-LSTM) neural network, and further study the effect of using attention weights in neural networks for sequence labelling tasks to extract knowledge of Adverse Drug Reactions (ADRs). We incorporate unsupervised word embeddings using Word2Vec and GloVe from widely available medical resources such as Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) II corpora, Uni- ed Medical Language System (UMLS) as well as embed pharmaco lexicon from available EHRs. Our algorithm, implemented using two datasets, shows that our architecture outperforms baseline Bi-LSTM or Bi-LSTM networks using linear chain and Skip-Chain conditional random fields (CRF).
Original language | English |
---|---|
Title of host publication | Proceedings of the ACM International Conference on Digital Health |
Publisher | Association for Computing Machinery |
Pages | 67-71 |
Number of pages | 5 |
Volume | Part F128634 |
ISBN (Electronic) | 9781450352499 |
DOIs | |
Publication status | Published - 2 Jul 2017 |
Event | 7th International Conference on Digital Health, DH 2017 - London, United Kingdom Duration: 2 Jul 2017 → 5 Jul 2017 |
Conference
Conference | 7th International Conference on Digital Health, DH 2017 |
---|---|
Country/Territory | United Kingdom |
City | London |
Period | 2/07/2017 → 5/07/2017 |
Keywords
- Adverse drug reactions
- Named entity recognition
- Recurrent neural networks