scholarly journals Inferring disease-associated microRNAs using semi-supervised multi-label graph convolutional networks

2019 ◽  
Author(s):  
Xiaoyong Pan ◽  
Hong-Bin Shen

AbstractMicroRNAs (miRNAs) play crucial roles in many biological processes involved in diseases. The associations between diseases and protein coding genes (PCGs) have been well investigated, and further the miRNAs interact with PCGs to trigger them to be functional. Thus, it is imperative to computationally infer disease-miRNA associations under the context of interaction networks.In this study, we present a computational method, DimiG, to infer miRNA-associated diseases using semi-supervised Graph Convolutional Network model (GCN). DimiG is a multi-label framework to integrate PCG-PCG interactions, PCG-miRNA interactions, PCG-disease associations and tissue expression profiles. DimiG is trained on disease-PCG associations and a graph constructed from interaction networks of PCG-PCG and miRNA-PCG using semi-supervised GCN, which is further used to score associations between diseases and miRNAs. We evaluate DimiG on a benchmark set collected from verified disease-miRNA associations. Our results demonstrate that the new DimiG yields promising performance and outperforms the best published baseline method not trained on disease-miRNA associations by 11% and is also superior to two state-of-the-art supervised methods trained on disease-miRNA associations. Three case studies of prostate cancer, lung cancer and Inflammatory bowel disease further demonstrate the efficacy of DimiG, where the top miRNAs predicted by DimiG for them are supported by literature or databases.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Chao-Hsin Chen ◽  
Chao-Yu Pan ◽  
Wen-chang Lin

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Mingyi Wang ◽  
Wei Chen ◽  
...  

Abstract Background Microbes are closely related to human health and diseases. Identification of disease-related microbes is of great significance for revealing the pathological mechanism of human diseases and understanding the interaction mechanisms between microbes and humans, which is also useful for the prevention, diagnosis and treatment of human diseases. Considering the known disease-related microbes are still insufficient, it is necessary to develop effective computational methods and reduce the time and cost of biological experiments. Methods In this work, we developed a novel computational method called MDAKRLS to discover potential microbe-disease associations (MDAs) based on the Kronecker regularized least squares. Specifically, we introduced the Hamming interaction profile similarity to measure the similarities of microbes and diseases besides Gaussian interaction profile kernel similarity. In addition, we introduced the Kronecker product to construct two kinds of Kronecker similarities between microbe-disease pairs. Then, we designed the Kronecker regularized least squares with different Kronecker similarities to obtain prediction scores, respectively, and calculated the final prediction scores by integrating the contributions of different similarities. Results The AUCs value of global leave-one-out cross-validation and 5-fold cross-validation achieved by MDAKRLS were 0.9327 and 0.9023 ± 0.0015, which were significantly higher than five state-of-the-art methods used for comparison. Comparison results demonstrate that MDAKRLS has faster computing speed under two kinds of frameworks. In addition, case studies of inflammatory bowel disease (IBD) and asthma further showed 19 (IBD), 19 (asthma) of the top 20 prediction disease-related microbes could be verified by previously published biological or medical literature. Conclusions All the evaluation results adequately demonstrated that MDAKRLS has an effective and reliable prediction performance. It may be a useful tool to seek disease-related new microbes and help biomedical researchers to carry out follow-up studies.


2017 ◽  
Vol 2017 ◽  
pp. 1-15 ◽  
Author(s):  
Fei Yuan ◽  
Yu-Hang Zhang ◽  
Xiang-Yin Kong ◽  
Yu-Dong Cai

Identification of disease genes is a hot topic in biomedicine and genomics. However, it is a challenging problem because of the complexity of diseases. Inflammatory bowel disease (IBD) is an idiopathic disease caused by a dysregulated immune response to host intestinal microflora. It has been proven to be associated with the development of intestinal malignancies. Although the specific pathological characteristics and genetic background of IBD have been partially revealed, it is still an overdetermined disease and the blueprint of all genetic variants still needs to be improved. In this study, a novel computational method was built to identify genes related to IBD. Samples from two subtypes of IBD (ulcerative colitis and Crohn’s disease) and normal samples were employed. By analyzing the gene expression profiles of these samples using minimum redundancy maximum relevance and incremental feature selection, 21 genes were obtained that could effectively distinguish samples from the two subtypes of IBD and the normal samples. Then, the shortest-path approach was used to search for an additional 20 genes in a large network constructed using protein-protein interactions based on the above-mentioned 21 genes. Analyses of the 41 genes obtained indicate that they are closely associated with this disease.


2021 ◽  
Author(s):  
Danh Bui-Thi ◽  
Emmanuel Rivière ◽  
Pieter Meysman ◽  
Kris Laukens

AbstractMotivationConvolutional neural networks have enabled unprecedented breakthroughs in a variety of computer vision tasks. They have also drawn much attention from other domains, including drug discovery and drug development. In this study, we develop a computational method based on convolutional neural networks to tackle a fundamental question in drug discovery and development, i.e. the prediction of compound-protein interactions based on compound structure and protein sequence. We propose a hierarchical graph convolutional network (HGCN) to encode small molecules. The HGCN aggregates a molecule embedding from substructure embeddings, which are synthesized from atom embeddings. As small molecules usually share substructures, computing a molecule embedding from those common substructures allows us to learn better generic models. We then combined the HGCN with a one-dimensional convolutional network to construct a complete model for predicting compound-protein interactions. Furthermore we apply an explanation technique, Grad-CAM, to visualize the contribution of each amino acid into the prediction.ResultsExperiments using different datasets show the improvement of our model compared to other GCN-based methods and a sequence based method, DeepDTA, in predicting compound-protein interactions. Each prediction made by the model is also explainable and can be used to identify critical residues mediating the interaction.Availability and implementationhttps://github.com/banhdzui/cpi_hgcn.git


2020 ◽  
Vol 36 (8) ◽  
pp. 2538-2546 ◽  
Author(s):  
Jin Li ◽  
Sai Zhang ◽  
Tao Liu ◽  
Chenxi Ning ◽  
Zhuoxuan Zhang ◽  
...  

Abstract Motivation Predicting the association between microRNAs (miRNAs) and diseases plays an import role in identifying human disease-related miRNAs. As identification of miRNA-disease associations via biological experiments is time-consuming and expensive, computational methods are currently used as effective complements to determine the potential associations between disease and miRNA. Results We present a novel method of neural inductive matrix completion with graph convolutional network (NIMCGCN) for predicting miRNA-disease association. NIMCGCN first uses graph convolutional networks to learn miRNA and disease latent feature representations from the miRNA and disease similarity networks. Then, learned features were input into a novel neural inductive matrix completion (NIMC) model to generate an association matrix completion. The parameters of NIMCGCN were learned based on the known miRNA-disease association data in a supervised end-to-end way. We compared the proposed method with other state-of-the-art methods. The area under the receiver operating characteristic curve results showed that our method is significantly superior to existing methods. Furthermore, 50, 47 and 48 of the top 50 predicted miRNAs for three high-risk human diseases, namely, colon cancer, lymphoma and kidney cancer, were verified using experimental literature. Finally, 100% prediction accuracy was achieved when breast cancer was used as a case study to evaluate the ability of NIMCGCN for predicting a new disease without any known related miRNAs. Availability and implementation https://github.com/ljatynu/NIMCGCN/ Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Zhouxin Yu ◽  
Feng Huang ◽  
Xiaohan Zhao ◽  
Wenjie Xiao ◽  
Wen Zhang

Abstract Background: Determining drug–disease associations is an integral part in the process of drug development. However, the identification of drug–disease associations through wet experiments is costly and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting drug–disease associations is of great significance. Results: In this paper, we propose a novel computational method named as layer attention graph convolutional network (LAGCN) for the drug–disease association prediction. Specifically, LAGCN first integrates the known drug–disease associations, drug–drug similarities and disease–disease similarities into a heterogeneous network, and applies the graph convolution operation to the network to learn the embeddings of drugs and diseases. Second, LAGCN combines the embeddings from multiple graph convolution layers using an attention mechanism. Third, the unobserved drug–disease associations are scored based on the integrated embeddings. Evaluated by 5-fold cross-validations, LAGCN achieves an area under the precision–recall curve of 0.3168 and an area under the receiver–operating characteristic curve of 0.8750, which are better than the results of existing state-of-the-art prediction methods and baseline methods. The case study shows that LAGCN can discover novel associations that are not curated in our dataset. Conclusion: LAGCN is a useful tool for predicting drug–disease associations. This study reveals that embeddings from different convolution layers can reflect the proximities of different orders, and combining the embeddings by the attention mechanism can improve the prediction performances.


Cells ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 977 ◽  
Author(s):  
Li ◽  
Liu ◽  
Hu ◽  
Que ◽  
Yao

Identifying the interactions between disease and microRNA (miRNA) can accelerate drugs development, individualized diagnosis, and treatment for various human diseases. However, experimental methods are time-consuming and costly. So computational approaches to predict latent miRNA–disease interactions are eliciting increased attention. But most previous studies have mainly focused on designing complicated similarity-based methods to predict latent interactions between miRNAs and diseases. In this study, we propose a novel computational model, termed heterogeneous graph convolutional network for miRNA–disease associations (HGCNMDA), which is based on known human protein–protein interaction (PPI) and integrates four biological networks: miRNA–disease, miRNA–gene, disease–gene, and PPI network. HGCNMDA achieved reliable performance using leave-one-out cross-validation (LOOCV). HGCNMDA is then compared to three state-of-the-art algorithms based on five-fold cross-validation. HGCNMDA achieves an AUC of 0.9626 and an average precision of 0.9660, respectively, which is ahead of other competitive algorithms. We further analyze the top-10 unknown interactions between miRNA and disease. In summary, HGCNMDA is a useful computational model for predicting miRNA–disease interactions.


2019 ◽  
Vol 19 (4) ◽  
pp. 232-241 ◽  
Author(s):  
Xuegong Chen ◽  
Wanwan Shi ◽  
Lei Deng

Background: Accumulating experimental studies have indicated that disease comorbidity causes additional pain to patients and leads to the failure of standard treatments compared to patients who have a single disease. Therefore, accurate prediction of potential comorbidity is essential to design more efficient treatment strategies. However, only a few disease comorbidities have been discovered in the clinic. Objective: In this work, we propose PCHS, an effective computational method for predicting disease comorbidity. Materials and Methods: We utilized the HeteSim measure to calculate the relatedness score for different disease pairs in the global heterogeneous network, which integrates six networks based on biological information, including disease-disease associations, drug-drug interactions, protein-protein interactions and associations among them. We built the prediction model using the Support Vector Machine (SVM) based on the HeteSim scores. Results and Conclusion: The results showed that PCHS performed significantly better than previous state-of-the-art approaches and achieved an AUC score of 0.90 in 10-fold cross-validation. Furthermore, some of our predictions have been verified in literatures, indicating the effectiveness of our method.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
David S. M. Lee ◽  
Joseph Park ◽  
Andrew Kromer ◽  
Aris Baras ◽  
Daniel J. Rader ◽  
...  

AbstractRibosome-profiling has uncovered pervasive translation in non-canonical open reading frames, however the biological significance of this phenomenon remains unclear. Using genetic variation from 71,702 human genomes, we assess patterns of selection in translated upstream open reading frames (uORFs) in 5’UTRs. We show that uORF variants introducing new stop codons, or strengthening existing stop codons, are under strong negative selection comparable to protein-coding missense variants. Using these variants, we map and validate gene-disease associations in two independent biobanks containing exome sequencing from 10,900 and 32,268 individuals, respectively, and elucidate their impact on protein expression in human cells. Our results suggest translation disrupting mechanisms relating uORF variation to reduced protein expression, and demonstrate that translation at uORFs is genetically constrained in 50% of human genes.


Sign in / Sign up

Export Citation Format

Share Document