scholarly journals Recombination Hotspot/Coldspot Identification Combining Three Different Pseudocomponents via an Ensemble Learning Approach

2016 ◽  
Vol 2016 ◽  
pp. 1-7 ◽  
Author(s):  
Bingquan Liu ◽  
Yumeng Liu ◽  
Dong Huang

Recombination presents a nonuniform distribution across the genome. Genomic regions that present relatively higher frequencies of recombination are called hotspots while those with relatively lower frequencies of recombination are recombination coldspots. Therefore, the identification of hotspots/coldspots could provide useful information for the study of the mechanism of recombination. In this study, a new computational predictor called SVM-EL was proposed to identify hotspots/coldspots across the yeast genome. It combined Support Vector Machines (SVMs) and Ensemble Learning (EL) based on three features including basic kmer (Kmer), dinucleotide-based auto-cross covariance (DACC), and pseudo dinucleotide composition (PseDNC). These features are able to incorporate the nucleic acid composition and their order information into the predictor. The proposed SVM-EL achieves an accuracy of 82.89% on a widely used benchmark dataset, which outperforms some related methods.

2019 ◽  
Vol 16 (4) ◽  
pp. 347-355
Author(s):  
Zhao-Chun Xu ◽  
Xuan Xiao ◽  
Wang-Ren Qiu ◽  
Peng Wang ◽  
Xin-Zhu Fang

As an important post-transcriptional modification, adenosine-to-inosine RNA editing generally occurs in both coding and noncoding RNA transcripts in which adenosines are converted to inosines. Accordingly, the diversification of the transcriptome can be resulted in by this modification. It is significant to accurately identify adenosine-to-inosine editing sites for further understanding their biological functions. Currently, the adenosine-to-inosine editing sites would be determined by experimental methods, unfortunately, it may be costly and time consuming. Furthermore, there are only a few existing computational prediction models in this field. Therefore, the work in this study is starting to develop other computational methods to address these problems. Given an uncharacterized RNA sequence that contains many adenosine resides, can we identify which one of them can be converted to inosine, and which one cannot? To deal with this problem, a novel predictor called iAI-DSAE is proposed in the current study. In fact, there are two key issues to address: one is ‘what feature extraction methods should be adopted to formulate the given sample sequence?’ The other is ‘what classification algorithms should be used to construct the classification model?’ For the former, a 540-dimensional feature vector is extracted to formulate the sample sequence by dinucleotide-based auto-cross covariance, pseudo dinucleotide composition, and nucleotide density methods. For the latter, we use the present more popular method i.e. deep spare autoencoder to construct the classification model. Generally, ACC and MCC are considered as the two of the most important performance indicators of a predictor. In this study, in comparison with those of predictor PAI, they are up 2.46% and 4.14%, respectively. The two other indicators, Sn and Sp, rise at certain degree also. This indicates that our predictor can be as an important complementary tool to identify adenosine-toinosine RNA editing sites. For the convenience of most experimental scientists, an easy-to-use web-server for identifying adenosine-to-inosine editing sites has been established at: http://www.jci-bioinfo.cn/iAI-DSAE, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It is important to identify adenosine-to-inosine editing sites in RNA sequences for the intensive study on RNA function and the development of new medicine. In current study, a novel predictor, called iAI-DSAE, was proposed by using three feature extraction methods including dinucleotidebased auto-cross covariance, pseudo dinucleotide composition and nucleotide density. The jackknife test results of the iAI-DSAE predictor based on deep spare auto-encoder model show that our predictor is more stable and reliable. It has not escaped our notice that the methods proposed in the current paper can be used to solve many other problems in genome analysis.


2021 ◽  
Vol 37 (3) ◽  
pp. 505-511
Author(s):  
Xueyuan Bai ◽  
Yingqiang Song ◽  
Ruiyang Yu ◽  
Jingling Xiong ◽  
Yufeng Peng ◽  
...  

HighlightsMonitored the canopy chlorophyll content of apple trees using hyperspectral reflectance information.Constructed support vector machine combination regression model (C-SVR) based on five-fold cross validation and support vector machine regression approach.Compared estimation accuracy of ensemble learning models (C-SVR, RF), machine learning models (SVR, ANN), and PLSR models for apple canopy chlorophyll content.Abstract. Rapidly and effective monitoring of the canopy chlorophyll content (CCC) of apple trees is of great significance for crop stress monitoring in precision agriculture. This study attempted to use hyperspectral vegetation indices (VIs) to estimate the CCC of apple trees based on ensemble learning approach. In this study, vegetation indices combined by any two wavelengths from 400 to 1100 nm were constructed to calculate the correlation coefficient with the CCC in apple. We constructed a partial least squares regression model (PLSR), artificial neural network regression model (ANN), support vector machine regression (SVR), random forest regression (RF) model and support vector machine combination regression model (C-SVR) based on combinations of VIs to improve the estimation accuracy in apple CCC. The results showed that the correlation coefficients between NDVI (949,695), OSAVI (828,705), RDVI (741,725), RVI (716,707), DVI (572,532), and apple CCC were all above 0.76. The CCC estimation model using the RF and C-SVR approach constructed by the NDVI (949,695), OSAVI (828,705), RDVI (741,725), RVI (716,707), and DVI (572,532) achieved the better estimation results, and the R2V, RMSEV, and RPDV values of models were 0.76, 0.131(mg . g-1), 2.04 and 0.78, 0.127(mg . g-1), 2.12, respectively. Compared with the PLSR, ANN, and SVR model, the R2V and RPDV values of C-SVR model were increased by 4%, 1.2%, 3.8%, and 5.0%, 28.4%, 7.1%, respectively. The results show that using C-SVR approach to estimating the apple CCC can realize high accuracy of quantitative estimation. Ensemble learning approach is an effective method for monitoring the nutrient status of fruit trees based on hyperspectral technique. Keywords: Apple tree canopy, Chlorophyll content, Crop stress monitoring, Ensemble learning, Hyperspectral, Vegetation index.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Gang Xie ◽  
Yingxue Zhao ◽  
Mao Jiang ◽  
Ning Zhang

This paper proposes a novel ensemble learning approach based on logistic regression (LR) and artificial intelligence tool, that is, support vector machine (SVM) and back-propagation neural networks (BPNN), for corporate financial distress forecasting in fashion and textiles supply chains. Firstly, related concepts of LR, SVM, and BPNN are introduced. Then, the forecasting results by LR are introduced into the SVM and BPNN techniques which can recognize the forecasting errors in fitness by LR. Moreover, empirical analysis of Chinese listed companies in fashion and textile sector is implemented for the comparison of the methods, and some related issues are discussed. The results suggest that the proposed novel ensemble learning approach can achieve higher forecasting performance than those of individual models.


Sign in / Sign up

Export Citation Format

Share Document