Heterogeneous data integration by tree-augmented naïve Bayes for protein-protein interactions prediction

PROTEOMICS ◽  
2012 ◽  
Vol 13 (2) ◽  
pp. 261-268 ◽  
Author(s):  
Xiaotong Lin ◽  
Xue-wen Chen
2018 ◽  
Author(s):  
Somnath Tagore ◽  
Alessandro Gorohovski ◽  
Lars Juhl Jensen ◽  
Milana Frenkel-Morgenstern

AbstractToday’s increase in scientific literature requires the efficient methods of data mining for improving the extraction of the useful information from texts. In this manuscript, we used a data and text mining method to identify fusions and their protein-protein interactions from published biomedical text. The extracted fusion proteins and their protein-protein interactions are used as a training set for a Naïve Bayes classifier that is further used for final identification of testing dataset, consisting of 1817 fusions. Our method has a literature corpus, text and annotation mappers; keywords, rule bases, negative tokens, and pattern extractor; synonym tagger, normalization, regular expression mapper; and Naïve Bayes classifier. We classified 1817 unique fusion proteins and their corresponding 2908 protein-protein interactions for 18 cancer types. Therefore, it can be used for screening literature for identifying mentions unique cases of fusions that can be further used for downstream analysis. It is available at http://protfus.md.biu.ac.il/.


2010 ◽  
Vol 11 (3) ◽  
pp. 292-298
Author(s):  
Hongjun SU ◽  
Yehua SHENG ◽  
Yongning WEN ◽  
Min CHEN

2014 ◽  
Vol 912-914 ◽  
pp. 1201-1204
Author(s):  
Gang Huang ◽  
Xiu Ying Wu ◽  
Man Yuan

This paper provides an ontology-based distributed heterogeneous data integration framework (ODHDIF). The framework resolves the problem of semantic interoperability between heterogeneous data sources in semantic level. By metadatas specifying the distributed, heterogeneous data and by describing semantic information of data source , having "ontology" as a common semantic model, semantic match is established through ontology mapping between heterogeneous data sources and semantic difference institutions are shielded, so that semantic heterogeneity problem of the heterogeneous data sources can be effectively solved. It provides an effective technology measure for the interior information of enterprises to be shared in time accurately.


Sign in / Sign up

Export Citation Format

Share Document