scholarly journals DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

Oncotarget ◽  
2017 ◽  
Vol 9 (2) ◽  
pp. 1944-1956 ◽  
Author(s):  
Balachandran Manavalan ◽  
Tae Hwan Shin ◽  
Gwang Lee
2017 ◽  
Author(s):  
Balachandran Manavalan ◽  
Tae Hwan Shin ◽  
Gwang Lee

AbstractDNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at:http://www.thegleelab.org/DHSpred.html.


2014 ◽  
Vol 2014 ◽  
pp. 1-4 ◽  
Author(s):  
Pengmian Feng ◽  
Ning Jiang ◽  
Nan Liu

DNase I hypersensitive sites (DHS) associated with a wide variety of regulatory DNA elements. Knowledge about the locations of DHS is helpful for deciphering the function of noncoding genomic regions. With the acceleration of genome sequences in the postgenomic age, it is highly desired to develop cost-effective computational methods to identify DHS. In the present work, a support vector machine based model was proposed to identify DHS by using the pseudo dinucleotide composition. In the jackknife test, the proposed model obtained an accuracy of 83%, which is competitive with that of the existing method. This result suggests that the proposed model may become a useful tool for DHS identifications.


2021 ◽  
Vol 209 ◽  
pp. 104223
Author(s):  
Wei Su ◽  
Fang Wang ◽  
Jiu-Xin Tan ◽  
Fu-Ying Dao ◽  
Hui Yang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document