Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data

Author(s):  
Cem Bugra Alkan ◽  
Zerrin Isik
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Qidong Cai ◽  
Boxue He ◽  
Pengfei Zhang ◽  
Zhenyu Zhao ◽  
Xiong Peng ◽  
...  

Abstract Background Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. Method RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). Results Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. Conclusion In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.


2018 ◽  
Author(s):  
Parth Patel ◽  
Sandra Mathioni ◽  
Atul Kakrana ◽  
Hagit Shatkay ◽  
Blake C. Meyers

Summary and keywordsLittle is known about the characteristics and function of reproductive phased, secondary, small interfering RNAs (phasiRNAs) in the Poaceae, despite the availability of significant genomic resources, experimental data, and a growing number of computational tools. We utilized machine-learning methods to identify sequence-based and structural features that distinguish phasiRNAs in rice and maize from other small RNAs (sRNAs).We developed Random Forest classifiers that can distinguish reproductive phasiRNAs from other sRNAs in complex sets of sequencing data, utilizing sequence-based (k-mers) and features describing position-specific sequence biases.The classification performance attained is >80% in accuracy, sensitivity, specificity, and positive predicted value. Feature selection identified important features in both ends of phasiRNAs. We demonstrated that phasiRNAs have strand specificity and position-specific nucleotide biases potentially influencing AGO sorting; we also predicted targets to infer functions of phasiRNAs, and computationally-assessed their sequence characteristics relative to other sRNAs.Our results demonstrate that machine-learning methods effectively identify phasiRNAs despite the lack of characteristic features typically present in precursor loci of other small RNAs, such as sequence conservation or structural motifs. The 5’-end features we identified provide insights into AGO-phasiRNA interactions; we describe a hypothetical model of competition for AGO loading between phasiRNAs of different nucleotide compositions.


2019 ◽  
Vol XXII (1) ◽  
pp. 196-199
Author(s):  
Rogobete M.

For the most machine learning methods, for cyclo-stationary or even stochastic signals, the performance depends critically on hyperparameters. Moreover, the tuning of more hyperparameters based on the feedback of the performance model will leak an increasingly significant amount of information about the validation set into the model. Therefore, we propose in this research two classes of hyperparameters, a general class that makes the characterization of general signal curve and the second, a specific class that define special parameters connected to the phenomena type (e.g. sensor type).


10.29007/lmmg ◽  
2018 ◽  
Author(s):  
Cezary Kaliszyk ◽  
Lionel Mamane ◽  
Josef Urban

We report the results of the first experiments with learning proof dependencies from the formalizations done with the Coq system. We explain the process of obtaining the dependencies from the Coq proofs, the characterization of formulas that is used for the learning, and the evaluation method. Various machine learning methods are compared on a dataset of 5021 toplevel Coq proofs coming from the CoRN repository. The best resulting method covers on average 75% of the needed proof dependencies among the first 100 predictions, which is a comparable performance of such initial experiments on other large-theory corpora.


2020 ◽  
Vol 25 (40) ◽  
pp. 4264-4273 ◽  
Author(s):  
Dan Zhang ◽  
Zheng-Xing Guan ◽  
Zi-Mei Zhang ◽  
Shi-Hao Li ◽  
Fu-Ying Dao ◽  
...  

Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.


Risk Analysis ◽  
2018 ◽  
Author(s):  
Patrick Murigu Kamau Njage ◽  
Clementine Henri ◽  
Pimlapas Leekitcharoenphon ◽  
Michel‐Yves Mistou ◽  
Rene S. Hendriksen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document