scholarly journals Machine Learning-based state-of-the-art methods for the classification of RNA-Seq data

2017 ◽  
Author(s):  
Almas Jabeen ◽  
Nadeem Ahmad ◽  
Khalid Raza

AbstractRNA-Seq measures expression levels of several transcripts simultaneously. The identified reads can be gene, exon, or other region of interest. Various computational tools have been developed for studying pathogen or virus from RNA-Seq data by classifying them according to the attributes in several predefined classes, but still computational tools and approaches to analyze complex datasets are still lacking. The development of classification models is highly recommended for disease diagnosis and classification, disease monitoring at molecular level as well as researching for potential disease biomarkers. In this chapter, we are going to discuss various machine learning approaches for RNA-Seq data classification and their implementation. Advancements in bioinformatics, along with developments in machine learning based classification, would provide powerful toolboxes for classifying transcriptome information available through RNA-Seq data.

2017 ◽  
Author(s):  
Boris Guennewig ◽  
Zachary Davies ◽  
Mark Pinese ◽  
Antony A Cooper

AbstractMotivationMachine learning (ML) is a powerful tool to create supervised models that can distinguish between classes and facilitate biomarker selection in high-dimensional datasets, including RNA Sequencing (RNA-Seq). However, it is variable as to which is the best performing ML algorithm(s) for a specific dataset, and identifying the optimal match is time consuming. blkbox is a software package including a shiny frontend, that integrates nine ML algorithms to select the best performing classifier for a specific dataset. blkbox accepts a simple abundance matrix as input, includes extensive visualization, and also provides an easy to use feature selection step to enable convenient and rapid potential biomarker selection, all without requiring parameter optimization.ResultsFeature selection makes blkbox computationally inexpensive while multi-functionality, including nested cross-fold validation (NCV), ensures robust results. blkbox identified algorithms that outperformed prior published ML results. Applying NCV identifies features, which are utilized to gain high accuracy.AvailabilityThe software is available as a CRAN R package and as a developer version with extended functionality on github (https://github.com/gboris/blkbox)[email protected]


Author(s):  
Mamehgol Yousefi ◽  
Azmin Shakrine ◽  
Samsuzana bt. Abd Aziz ◽  
Syaril Azrad ◽  
Mohamed Mazmira ◽  
...  

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dennie te Molder ◽  
Wasin Poncheewin ◽  
Peter J. Schaap ◽  
Jasper J. Koehorst

Abstract Background The genus Xanthomonas has long been considered to consist predominantly of plant pathogens, but over the last decade there has been an increasing number of reports on non-pathogenic and endophytic members. As Xanthomonas species are prevalent pathogens on a wide variety of important crops around the world, there is a need to distinguish between these plant-associated phenotypes. To date a large number of Xanthomonas genomes have been sequenced, which enables the application of machine learning (ML) approaches on the genome content to predict this phenotype. Until now such approaches to the pathogenomics of Xanthomonas strains have been hampered by the fragmentation of information regarding pathogenicity of individual strains over many studies. Unification of this information into a single resource was therefore considered to be an essential step. Results Mining of 39 papers considering both plant-associated phenotypes, allowed for a phenotypic classification of 578 Xanthomonas strains. For 65 plant-pathogenic and 53 non-pathogenic strains the corresponding genomes were available and de novo annotated for the presence of Pfam protein domains used as features to train and compare three ML classification algorithms; CART, Lasso and Random Forest. Conclusion The literature resource in combination with recursive feature extraction used in the ML classification algorithms provided further insights into the virulence enabling factors, but also highlighted domains linked to traits not present in pathogenic strains.


Sign in / Sign up

Export Citation Format

Share Document