Predicting the interaction biomolecule types for lncRNA: an ensemble deep learning approach

Author(s):  
Yu Zhang ◽  
Cangzhi Jia ◽  
Chee Keong Kwoh

Abstract Long noncoding RNAs (lncRNAs) play significant roles in various physiological and pathological processes via their interactions with biomolecules like DNA, RNA and protein. The existing in silico methods used for predicting the functions of lncRNA mainly rely on calculating the similarity of lncRNA or investigating whether an lncRNA can interact with a specific biomolecule or disease. In this work, we explored the functions of lncRNA from a different perspective: we presented a tool for predicting the interaction biomolecule type for a given lncRNA. For this purpose, we first investigated the main molecular mechanisms of the interactions of lncRNA–RNA, lncRNA–protein and lncRNA–DNA. Then, we developed an ensemble deep learning model: lncIBTP (lncRNA Interaction Biomolecule Type Prediction). This model predicted the interactions between lncRNA and different types of biomolecules. On the 5-fold cross-validation, the lncIBTP achieves average values of 0.7042 in accuracy, 0.7903 and 0.6421 in macro-average area under receiver operating characteristic curve and precision–recall curve, respectively, which illustrates the model effectiveness. Besides, based on the analysis of the collected published data and prediction results, we hypothesized that the characteristics of lncRNAs that interacted with DNA may be different from those that interacted with only RNA.

2020 ◽  
Vol 10 (4) ◽  
pp. 211 ◽  
Author(s):  
Yong Joon Suh ◽  
Jaewon Jung ◽  
Bum-Joo Cho

Mammography plays an important role in screening breast cancer among females, and artificial intelligence has enabled the automated detection of diseases on medical images. This study aimed to develop a deep learning model detecting breast cancer in digital mammograms of various densities and to evaluate the model performance compared to previous studies. From 1501 subjects who underwent digital mammography between February 2007 and May 2015, craniocaudal and mediolateral view mammograms were included and concatenated for each breast, ultimately producing 3002 merged images. Two convolutional neural networks were trained to detect any malignant lesion on the merged images. The performances were tested using 301 merged images from 284 subjects and compared to a meta-analysis including 12 previous deep learning studies. The mean area under the receiver-operating characteristic curve (AUC) for detecting breast cancer in each merged mammogram was 0.952 ± 0.005 by DenseNet-169 and 0.954 ± 0.020 by EfficientNet-B5, respectively. The performance for malignancy detection decreased as breast density increased (density A, mean AUC = 0.984 vs. density D, mean AUC = 0.902 by DenseNet-169). When patients’ age was used as a covariate for malignancy detection, the performance showed little change (mean AUC, 0.953 ± 0.005). The mean sensitivity and specificity of the DenseNet-169 (87 and 88%, respectively) surpassed the mean values (81 and 82%, respectively) obtained in a meta-analysis. Deep learning would work efficiently in screening breast cancer in digital mammograms of various densities, which could be maximized in breasts with lower parenchyma density.


2021 ◽  
Vol 11 ◽  
Author(s):  
Tianle Shen ◽  
Runping Hou ◽  
Xiaodan Ye ◽  
Xiaoyang Li ◽  
Junfeng Xiong ◽  
...  

BackgroundTo develop and validate a deep learning–based model on CT images for the malignancy and invasiveness prediction of pulmonary subsolid nodules (SSNs).Materials and MethodsThis study retrospectively collected patients with pulmonary SSNs treated by surgery in our hospital from 2012 to 2018. Postoperative pathology was used as the diagnostic reference standard. Three-dimensional convolutional neural network (3D CNN) models were constructed using preoperative CT images to predict the malignancy and invasiveness of SSNs. Then, an observer reader study conducted by two thoracic radiologists was used to compare with the CNN model. The diagnostic power of the models was evaluated with receiver operating characteristic curve (ROC) analysis.ResultsA total of 2,614 patients were finally included and randomly divided for training (60.9%), validation (19.1%), and testing (20%). For the benign and malignant classification, the best 3D CNN model achieved a satisfactory AUC of 0.913 (95% CI: 0.885–0.940), sensitivity of 86.1%, and specificity of 83.8% at the optimal decision point, which outperformed all observer readers’ performance (AUC: 0.846±0.031). For pre-invasive and invasive classification of malignant SSNs, the 3D CNN also achieved satisfactory AUC of 0.908 (95% CI: 0.877–0.939), sensitivity of 87.4%, and specificity of 80.8%.ConclusionThe deep-learning model showed its potential to accurately identify the malignancy and invasiveness of SSNs and thus can help surgeons make treatment decisions.


2021 ◽  
Vol 11 ◽  
Author(s):  
Chenzhao Feng ◽  
Tianyu Xiang ◽  
Zixuan Yi ◽  
Xinyao Meng ◽  
Xufeng Chu ◽  
...  

BackgroundNeuroblastoma is one of the most devastating forms of childhood cancer. Despite large amounts of attempts in precise survival prediction in neuroblastoma, the prediction efficacy remains to be improved.MethodsHere, we applied a deep-learning (DL) model with the attention mechanism to predict survivals in neuroblastoma. We utilized 2 groups of features separated from 172 genes, to train 2 deep neural networks and combined them by the attention mechanism.ResultsThis classifier could accurately predict survivals, with areas under the curve of receiver operating characteristic (ROC) curves and time-dependent ROC reaching 0.968 and 0.974 in the training set respectively. The accuracy of the model was further confirmed in a validation cohort. Importantly, the two feature groups were mapped to two groups of patients, which were prognostic in Kaplan-Meier curves. Biological analyses showed that they exhibited diverse molecular backgrounds which could be linked to the prognosis of the patients.ConclusionsIn this study, we applied artificial intelligence methods to improve the accuracy of neuroblastoma survival prediction based on gene expression and provide explanations for better understanding of the molecular mechanisms underlying neuroblastoma.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Wenying Zhou ◽  
Yang Yang ◽  
Cheng Yu ◽  
Juxian Liu ◽  
Xingxing Duan ◽  
...  

AbstractIt is still challenging to make accurate diagnosis of biliary atresia (BA) with sonographic gallbladder images particularly in rural area without relevant expertise. To help diagnose BA based on sonographic gallbladder images, an ensembled deep learning model is developed. The model yields a patient-level sensitivity 93.1% and specificity 93.9% [with areas under the receiver operating characteristic curve of 0.956 (95% confidence interval: 0.928-0.977)] on the multi-center external validation dataset, superior to that of human experts. With the help of the model, the performances of human experts with various levels are improved. Moreover, the diagnosis based on smartphone photos of sonographic gallbladder images through a smartphone app and based on video sequences by the model still yields expert-level performances. The ensembled deep learning model in this study provides a solution to help radiologists improve the diagnosis of BA in various clinical application scenarios, particularly in rural and undeveloped regions with limited expertise.


2017 ◽  
Author(s):  
Ashley I. Naimi ◽  
Laura B. Balzer

AbstractStacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as “Super Learner”. Super Learner uses V -fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of the Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details. We work step-by-step through two examples to illustrate concepts and address common concerns.


2021 ◽  
Vol 22 (24) ◽  
pp. 13607
Author(s):  
Zhou Huang ◽  
Yu Han ◽  
Leibo Liu ◽  
Qinghua Cui ◽  
Yuan Zhou

MicroRNAs (miRNAs) are associated with various complex human diseases and some miRNAs can be directly involved in the mechanisms of disease. Identifying disease-causative miRNAs can provide novel insight in disease pathogenesis from a miRNA perspective and facilitate disease treatment. To date, various computational models have been developed to predict general miRNA–disease associations, but few models are available to further prioritize causal miRNA–disease associations from non-causal associations. Therefore, in this study, we constructed a Levenshtein-Distance-Enhanced miRNA–Disease Causal Association Predictor (LE-MDCAP), to predict potential causal miRNA–disease associations. Specifically, Levenshtein distance matrixes covering the sequence, expression and functional miRNA similarities were introduced to enhance the previous Gaussian interaction profile kernel-based similarity matrix. LE-MDCAP integrated miRNA similarity matrices, disease semantic similarity matrix and known causal miRNA–disease associations to make predictions. For regular causal vs. non-disease association discrimination task, LF-MDCAP achieved area under the receiver operating characteristic curve (AUROC) of 0.911 and 0.906 in 10-fold cross-validation and independent test, respectively. More importantly, LE-MDCAP prominently outperformed the previous MDCAP model in distinguishing causal versus non-causal miRNA–disease associations (AUROC 0.820 vs. 0.695). Case studies performed on diabetic retinopathy and hsa-mir-361 also validated the accuracy of our model. In summary, LE-MDCAP could be useful for screening causal miRNA–disease associations from general miRNA–disease associations.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1127
Author(s):  
Ji Hyung Nam ◽  
Dong Jun Oh ◽  
Sumin Lee ◽  
Hyun Joo Song ◽  
Yun Jeong Lim

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii148-ii148
Author(s):  
Yoshihiro Muragaki ◽  
Yutaka Matsui ◽  
Takashi Maruyama ◽  
Masayuki Nitta ◽  
Taiichi Saito ◽  
...  

Abstract INTRODUCTION It is useful to know the molecular subtype of lower-grade gliomas (LGG) when deciding on a treatment strategy. This study aims to diagnose this preoperatively. METHODS A deep learning model was developed to predict the 3-group molecular subtype using multimodal data including magnetic resonance imaging (MRI), positron emission tomography (PET), and computed tomography (CT). The performance was evaluated using leave-one-out cross validation with a dataset containing information from 217 LGG patients. RESULTS The model performed best when the dataset contained MRI, PET, and CT data. The model could predict the molecular subtype with an accuracy of 96.6% for the training dataset and 68.7% for the test dataset. The model achieved test accuracies of 58.5%, 60.4%, and 59.4% when the dataset contained only MRI, MRI and PET, and MRI and CT data, respectively. The conventional method used to predict mutations in the isocitrate dehydrogenase (IDH) gene and the codeletion of chromosome arms 1p and 19q (1p/19q) sequentially had an overall accuracy of 65.9%. This is 2.8 percent point lower than the proposed method, which predicts the 3-group molecular subtype directly. CONCLUSIONS AND FUTURE PERSPECTIVE A deep learning model was developed to diagnose the molecular subtype preoperatively based on multi-modality data in order to predict the 3-group classification directly. Cross-validation showed that the proposed model had an overall accuracy of 68.7% for the test dataset. This is the first model to double the expected value for a 3-group classification problem, when predicting the LGG molecular subtype. We plan to apply the techniques of heat map and/or segmentation for an increase in prediction accuracy.


Author(s):  
Jin-Fan Li ◽  
Xiao-Jing Ma ◽  
Lin-Lin Ying ◽  
Ying-hui Tong ◽  
Xue-ping Xiang

Acute lymphoblastic leukemia (ALL) as a common cancer is a heterogeneous disease which is mainly divided into BCP-ALL and T-ALL, accounting for 80–85% and 15–20%, respectively. There are many differences between BCP-ALL and T-ALL, including prognosis, treatment, drug screening, gene research and so on. In this study, starting with methylation and gene expression data, we analyzed the molecular differences between BCP-ALL and T-ALL and identified the multi-omics signatures using Boruta and Monte Carlo feature selection methods. There were 7 expression signature genes (CD3D, VPREB3, HLA-DRA, PAX5, BLNK, GALNT6, SLC4A8) and 168 methylation sites corresponding to 175 methylation signature genes. The overall accuracy, accuracy of BCP-ALL, accuracy of T-ALL of the RIPPER (Repeated Incremental Pruning to Produce Error Reduction) classifier using these signatures evaluated with 10-fold cross validation repeated 3 times were 0.973, 0.990, and 0.933, respectively. Two overlapped genes between 175 methylation signature genes and 7 expression signature genes were CD3D and VPREB3. The network analysis of the methylation and expression signature genes suggested that their common gene, CD3D, was not only different on both methylation and expression levels, but also played a key regulatory role as hub on the network. Our results provided insights of understanding the underlying molecular mechanisms of ALL and facilitated more precision diagnosis and treatment of ALL.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaoting Yin ◽  
Xiaosha Tao

Online business has grown exponentially during the last decade, and the industries are focusing on online business more than before. However, just setting up an online store and starting selling might not work. Different machine learning and data mining techniques are needed to know the users’ preferences and know what would be best for business. According to the decision-making needs of online product sales, combined with the influencing factors of online product sales in various industries and the advantages of deep learning algorithm, this paper constructs a sales prediction model suitable for online products and focuses on evaluating the adaptability of the model in different types of online products. In the research process, the full connection model is compared with the training results of CNN, which proves the accuracy and generalization ability of CNN model. By selecting the non-deep learning model as the comparison baseline, the performance advantages of CNN model under different categories of products are proved. In addition, the experiment concludes that the unsupervised pretrained CNN model is more effective and adaptable in sales forecasting.


Sign in / Sign up

Export Citation Format

Share Document