scholarly journals Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI

2020 ◽  
Vol 10 (4) ◽  
pp. 286
Author(s):  
Tak Sung Heo ◽  
Yu Seop Kim ◽  
Jeong Myeong Choi ◽  
Yeong Seok Jeong ◽  
Soo Young Seo ◽  
...  

Brain magnetic resonance imaging (MRI) is useful for predicting the outcome of patients with acute ischemic stroke (AIS). Although deep learning (DL) using brain MRI with certain image biomarkers has shown satisfactory results in predicting poor outcomes, no study has assessed the usefulness of natural language processing (NLP)-based machine learning (ML) algorithms using brain MRI free-text reports of AIS patients. Therefore, we aimed to assess whether NLP-based ML algorithms using brain MRI text reports could predict poor outcomes in AIS patients. This study included only English text reports of brain MRIs examined during admission of AIS patients. Poor outcome was defined as a modified Rankin Scale score of 3–6, and the data were captured by trained nurses and physicians. We only included MRI text report of the first MRI scan during the admission. The text dataset was randomly divided into a training and test dataset with a 7:3 ratio. Text was vectorized to word, sentence, and document levels. In the word level approach, which did not consider the sequence of words, and the “bag-of-words” model was used to reflect the number of repetitions of text token. The “sent2vec” method was used in the sensation-level approach considering the sequence of words, and the word embedding was used in the document level approach. In addition to conventional ML algorithms, DL algorithms such as the convolutional neural network (CNN), long short-term memory, and multilayer perceptron were used to predict poor outcomes using 5-fold cross-validation and grid search techniques. The performance of each ML classifier was compared with the area under the receiver operating characteristic (AUROC) curve. Among 1840 subjects with AIS, 645 patients (35.1%) had a poor outcome 3 months after the stroke onset. Random forest was the best classifier (0.782 of AUROC) using a word-level approach. Overall, the document-level approach exhibited better performance than did the word- or sentence-level approaches. Among all the ML classifiers, the multi-CNN algorithm demonstrated the best classification performance (0.805), followed by the CNN (0.799) algorithm. When predicting future clinical outcomes using NLP-based ML of radiology free-text reports of brain MRI, DL algorithms showed superior performance over the other ML algorithms. In particular, the prediction of poor outcomes in document-level NLP DL was improved more by multi-CNN and CNN than by recurrent neural network-based algorithms. NLP-based DL algorithms can be used as an important digital marker for unstructured electronic health record data DL prediction.

2021 ◽  
Vol 28 (1) ◽  
pp. e100262
Author(s):  
Mustafa Khanbhai ◽  
Patrick Anyadi ◽  
Joshua Symons ◽  
Kelsey Flott ◽  
Ara Darzi ◽  
...  

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.


2017 ◽  
Vol 56 (05) ◽  
pp. 377-389 ◽  
Author(s):  
Xingyu Zhang ◽  
Joyce Kim ◽  
Rachel E. Patzer ◽  
Stephen R. Pitts ◽  
Aaron Patzer ◽  
...  

SummaryObjective: To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements.Methods: Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient’s reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model.Results: Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.7310.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN.Conclusions: The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient’s reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.


JAMIA Open ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 139-149 ◽  
Author(s):  
Meijian Guan ◽  
Samuel Cho ◽  
Robin Petro ◽  
Wei Zhang ◽  
Boris Pasche ◽  
...  

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.


Author(s):  
Tamanna Sharma ◽  
Anu Bajaj ◽  
Om Prakash Sangwan

Sentiment analysis is computational measurement of attitude, opinions, and emotions (like positive/negative) with the help of text mining and natural language processing of words and phrases. Incorporation of machine learning techniques with natural language processing helps in analysing and predicting the sentiments in more precise manner. But sometimes, machine learning techniques are incapable in predicting sentiments due to unavailability of labelled data. To overcome this problem, an advanced computational technique called deep learning comes into play. This chapter highlights latest studies regarding use of deep learning techniques like convolutional neural network, recurrent neural network, etc. in sentiment analysis.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hao Yang ◽  
Qin He ◽  
Zhenyan Liu ◽  
Qian Zhang

The development of Internet and network applications has brought the development of encrypted communication technology. But on this basis, malicious traffic also uses encryption to avoid traditional security protection and detection. Traditional security protection and detection methods cannot accurately detect encrypted malicious traffic. In recent years, the rise of artificial intelligence allows us to use machine learning and deep learning methods to detect encrypted malicious traffic without decryption, and the detection results are very accurate. At present, the research on malicious encrypted traffic detection mainly focuses on the characteristics’ analysis of encrypted traffic and the selection of machine learning algorithms. In this paper, a method combining natural language processing and machine learning is proposed; that is, a detection method based on TF-IDF is proposed to build a detection model. In the process of data preprocessing, this method introduces the natural language processing method, namely, the TF-IDF model, to extract data information, obtain the importance of keywords, and then reconstruct the characteristics of data. The detection method based on the TF-IDF model does not need to analyze each field of the data set. Compared with the general machine learning data preprocessing method, that is, data encoding processing, the experimental results show that using natural language processing technology to preprocess data can effectively improve the accuracy of detection. Gradient boosting classifier, random forest classifier, AdaBoost classifier, and the ensemble model based on these three classifiers are, respectively, used in the construction of the later models. At the same time, CNN neural network in deep learning is also used for training, and CNN can effectively extract data information. Under the condition that the input data of the classifier and neural network are consistent, through the comparison and analysis of various methods, the accuracy of the one-dimensional convolutional network based on CNN is slightly higher than that of the classifier based on machine learning.


Author(s):  
Filipe R Lucini ◽  
Karla D Krewulak ◽  
Kirsten M Fiest ◽  
Sean M Bagshaw ◽  
Danny J Zuege ◽  
...  

Abstract Objective To apply natural language processing (NLP) techniques to identify individual events and modes of communication between healthcare professionals and families of critically ill patients from electronic medical records (EMR). Materials and Methods Retrospective cohort study of 280 randomly selected adult patients admitted to 1 of 15 intensive care units (ICU) in Alberta, Canada from June 19, 2012 to June 11, 2018. Individual events and modes of communication were independently abstracted using NLP and manual chart review (reference standard). Preprocessing techniques and 2 NLP approaches (rule-based and machine learning) were evaluated using sensitivity, specificity, and area under the receiver operating characteristic curves (AUROC). Results Over 2700 combinations of NLP methods and hyperparameters were evaluated for each mode of communication using a holdout subset. The rule-based approach had the highest AUROC in 65 datasets compared to the machine learning approach in 21 datasets. Both approaches had similar performance in 17 datasets. The rule-based AUROC for the grouped categories of patient documented to have family or friends (0.972, 95% CI 0.934–1.000), visit by family/friend (0.882 95% CI 0.820–0.943) and phone call with family/friend (0.975, 95% CI: 0.952–0.998) were high. Discussion We report an automated method to quantify communication between healthcare professionals and family members of adult patients from free-text EMRs. A rule-based NLP approach had better overall operating characteristics than a machine learning approach. Conclusion NLP can automatically and accurately measure frequency and mode of documented family visitation and communication from unstructured free-text EMRs, to support patient- and family-centered care initiatives.


2021 ◽  
Author(s):  
Saniya Karnik ◽  
Supriya Gupta ◽  
Jason Baihly ◽  
David Saier

Abstract Recent advancements in the field of natural language processing (NLP) and machine learning has allowed for the potential to ingest decades of field history and heterogeneous production records. This paper proposes an analytics workflow that leverages artificial intelligence to process thousands of historical workover reports (handwritten and electronic), extract important information, learn patterns in production activity, and train machines to quantify workover impact and derive best practices for field operations. Natural language processing libraries were developed to ingest and catalog gigabytes of field data, identify rich sources of workover information, and extract workover and cost information from unstructured reports. A clustering based architecture was developed and trained to categorize documents based on free text describing the activities found in reports. This machine learning model learnt the pattern and context of repeating words and was able to cluster documents with similar content together. This enabled the user to find a category of documents e.g. workover intervention reports instantaneously. Statistical models were built to determine return on investment from workovers and rank them based on production improvement and payout time. Today, 80% of an oilfield expert's time can be spent manually organizing data. When processing decades of historical oilfield production data spread across both structured (production timeseries) and unstructured records (e.g., workover reports), experts often face two major challenges: 1) How to rapidly analyze field data with thousands of historical records. 2) How to use the rich historical information to generate effective insights to take the proper actions to optimize production. In this paper, we analyzed multiple field datasets in a heterogeneous file environment with 20 different file formats (PDF, Excel, and other formats), 2,000+ files, production history spanning 50+ years across, and 2,000+ producing wells. Libraries were developed to extract files from complex folder hierarchies, machine learning architectures assisted in finding the workover reports from the myriad documents. Information from reports was extracted through Python libraries and optical character recognition technology to build master data source with production history, workover and cost information. The rich dataset was then used to analyze episodic workover activity by well and compute key performance indicators (KPIs) to identify well candidates for production enhancement. The building blocks included quantifying production upside and calculating return of investment for various workover classes. O&G companies have vast volumes of unstructured data and use less than 1% of it to uncover meaningful insights about field operations. Our workflow describes a methodology to ingest both structured and unstructured documents, capture knowledge, quantify production upside, understand capital spending, and learn best practices in workover operations through an automated process. This process helps optimize forward operating expense (OPEX) plans with a focus on cost reduction and shortened turnaround time for decision making.


2019 ◽  
Vol 5 (suppl) ◽  
pp. 49-49
Author(s):  
Christi French ◽  
Dax Kurbegov ◽  
David R. Spigel ◽  
Maciek Makowski ◽  
Samantha Terker ◽  
...  

49 Background: Pulmonary nodule incidental findings challenge providers to balance resource efficiency and high clinical quality. Incidental findings tend to be under evaluated with studies reporting appropriate follow-up rates as low as 29%. The efficient identification of patients with high risk nodules is foundational to ensuring appropriate follow-up and requires the clinical reading and classification of radiology reports. We tested the feasibility of automating this process with natural language processing (NLP) and machine learning (ML). Methods: In cooperation with Sarah Cannon, the Cancer Institute of HCA Healthcare, we conducted a series of experiments on 8,879 free-text, narrative CT radiology reports. A representative sample of health system ED, IP, and OP reports dated from Dec 2015 - April 2017 were divided into a development set for model training and validation, and a test set to evaluate model performance. A “Nodule Model” was trained to detect the reported presence of a pulmonary nodule and a rules-based “Size Model” was developed to extract the size of the nodule in mms. Reports were bucketed into three prediction groups: ≥ 6 mm, <6 mm, and no size indicated. Nodules were placed in a queue for follow-up if the nodule was predicted ≥ 6 mm, or if the nodule had no size indicated and the report contained the word “mass.” The Fleischner Society Guidelines and clinical review informed these definitions. Results: Precision and recall metrics were calculated for multiple model thresholds. A threshold was selected based on the validation set calculations and a success criterion of 90% queue precision was selected to minimize false positives. On the test dataset, the F1 measure of the entire pipeline was 72.9%, recall was 60.3%, and queue precision was 90.2%, exceeding success criteria. Conclusions: The experiments demonstrate the feasibility of technology to automate the detection and classification of pulmonary nodule incidental findings in radiology reports. This approach promises to improve healthcare quality by increasing the rate of appropriate lung nodule incidental finding follow-up and treatment without excessive labor or risking overutilization.


Sign in / Sign up

Export Citation Format

Share Document