enhancer identification
Recently Published Documents


TOTAL DOCUMENTS

22
(FIVE YEARS 4)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Li Ye ◽  
Chunquan Li ◽  
Jiquan Ma

The identification of enhancers has always been an important task in bioinformatics owing to their major role in regulating gene expression. For this reason, many computational algorithms devoted to enhancer identification have been put forward over the years, ranging from statistics and machine learning to the increasing popular deep learning. To boost the performance of their methods, more features tend to be extracted from the single DNA sequences and integrated to develop an ensemble classifier. Nevertheless, the sequence-derived features used in previous studies can hardly provide the 3D structure information of DNA sequences, which is regarded as an important factor affecting the binding preferences of transcription factors to regulatory elements like enhancers. Given that, we here propose DENIES, a deep learning based two-layer predictor for enhancing the identification of enhancers and their strength. Besides two common sequence-derived features (i.e. one-hot and k-mer), it introduces DNA shape for describing the 3D structures of DNA sequences. The results of performance comparison with a series of state-of-the-art methods conducted on the same datasets prove the effectiveness and robustness of our method. The code implementation of our predictor is freely available at https://github.com/hlju-liye/DENIES.


2021 ◽  
Author(s):  
QH Nguyen ◽  
TH Nguyen-Vo ◽  
NQK Le ◽  
TTT Do ◽  
S Rahardja ◽  
...  

© 2019 The Author(s). Background: Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. Results: Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. Conclusions: iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.


2021 ◽  
Author(s):  
QH Nguyen ◽  
TH Nguyen-Vo ◽  
NQK Le ◽  
TTT Do ◽  
S Rahardja ◽  
...  

© 2019 The Author(s). Background: Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. Results: Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. Conclusions: iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.


2021 ◽  
Vol 18 (6) ◽  
pp. 8797-8814
Author(s):  
Yunyun Liang ◽  
◽  
Shengli Zhang ◽  
Huijuan Qiao ◽  
Yinan Cheng ◽  
...  

<abstract> <p>Enhancer is a non-coding DNA fragment that can be bound with proteins to activate transcription of a gene, hence play an important role in regulating gene expression. Enhancer identification is very challenging and more complicated than other genetic factors due to their position variation and free scattering. In addition, it has been proved that genetic variation in enhancers is related to human diseases. Therefore, identification of enhancers and their strength has important biological meaning. In this paper, a novel model named iEnhancer-MFGBDT is developed to identify enhancer and their strength by fusing multiple features and gradient boosting decision tree (GBDT). Multiple features include k-mer and reverse complement k-mer nucleotide composition based on DNA sequence, and second-order moving average, normalized Moreau-Broto auto-cross correlation and Moran auto-cross correlation based on dinucleotide physical structural property matrix. Then we use GBDT to select features and perform classification successively. The accuracies reach 78.67% and 66.04% for identifying enhancers and their strength on the benchmark dataset, respectively. Compared with other models, the results show that our model is useful and effective intelligent tool to identify enhancers and their strength, of which the datasets and source codes are available at https://github.com/shengli0201/iEnhancer-MFGBDT1.</p> </abstract>


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Qingwen Li ◽  
Lei Xu ◽  
Qingyuan Li ◽  
Lichao Zhang

Enhancers are noncoding fragments in DNA sequences, which play an important role in gene transcription and translation. However, due to their high free scattering and positional variability, the identification and classification of enhancers have a higher level of complexity than those of coding genes. In order to solve this problem, many computer studies have been carried out in this field, but there are still some deficiencies in these prediction models. In this paper, we use various feature extraction strategies, dimension reduction technology, and a comprehensive application of machine model and recurrent neural network model to achieve an accurate prediction of enhancer identification and classification with the accuracy of was 76.7% and 84.9%, respectively. The model proposed in this paper is superior to the previous methods in performance index or feature dimension, which provides inspiration for the prediction of enhancers by computer technology in the future.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Quang H. Nguyen ◽  
Thanh-Hoang Nguyen-Vo ◽  
Nguyen Quoc Khanh Le ◽  
Trang T.T. Do ◽  
Susanto Rahardja ◽  
...  

Abstract Background Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.’s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. Results Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews’s correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. Conclusions iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.


2018 ◽  
Author(s):  
Shalu Jhanwar ◽  
Stephan Ossowski ◽  
Jose Davila-Velderrain

AbstractRecently enhancers have emerged as key players regulating crucial mechanisms such as cell fate determination and establishment of spatiotemporal patterns of gene expression during development. Due to their functional and structural complexity, an accurate in silico identification of active enhancers under specific conditions remain challenging. We present a novel machine learning based method that derives epigenomic patterns exclusively from experimentally characterized active enhancers contrasted with a weighted set of non-enhancer genomic regions. We demonstrate better predictive performance over previous methods, as well as wide generalizability by identifying and annotating active enhancers genome-wide across different tissues/cell types in human and mouse.


2018 ◽  
Author(s):  
Anne Sonnenschein ◽  
Ian Dworkin ◽  
David N. Arnosti

ABSTRACTPredicting regulatory function of non-coding DNA using genomic information remains a major goal in genomics, and an important step in interpreting the cis-regulatory code. Regulatory capacity can be partially inferred from transcription factor occupancy, histone modifications, motif enrichment, and evolutionary conservation. However, combinations of these features in well-studied systems such as Drosophila have limited predictive accuracy. Here we examine the current limits of computational enhancer prediction by applying machine-learning methods to an extensive set of genomic features, validating predictions with the Fly Enhancer Resource, which characterized the transcriptional activity of approximately fifteen percent of the genome. Supervised machine learning trained on a range of genomic features identify active elements with a high degree of accuracy, but are less successful at distinguishing tissue-specific expression patterns. Consistent with previous observations of their widespread genomic interactions, many transcription factors were associated with enhancers not known to be direct functional targets. Interestingly, no single factor was necessary for enhancer identification, although binding by the ′pioneer′ transcription factor Zelda was the most predictive feature for enhancer activity. Using an increasing number of predictive features improved classification with diminishing returns. Thus, additional single-timepoint ChIP data may have only marginal utility for discerning true regulatory regions. On the other hand, spatially- and temporally-differentiated genomic features may provide more power for this type of computational enhancer identification. Inclusion of new types of information distinct from current chromatin-immunoprecipitation data may enable more precise identification of enhancers, and further insight into the features that distinguish their biological functions.


Development ◽  
2018 ◽  
Vol 145 (7) ◽  
pp. dev160663 ◽  
Author(s):  
Yi-Ting Lai ◽  
Kevin D. Deem ◽  
Ferran Borràs-Castells ◽  
Nagraj Sambrani ◽  
Heike Rudolf ◽  
...  

2017 ◽  
Author(s):  
Yi-Ting Lai ◽  
Kevin D. Deem ◽  
Ferran Borràs-Castells ◽  
Nagraj Sambrani ◽  
Heike Rudolf ◽  
...  

ABSTRACTEvolution of cis-properties (such as enhancers) often plays an important role in the production of diverse morphology. However, a mechanistic understanding is often limited by the absence of methods to study enhancers in species outside of established model systems. Here, we sought to establish methods to identify and test enhancer activity in the red flour beetle, Tribolium castaneum. To identify possible enhancer regions, we first obtained genome-wide chromatin profiles from various tissues and stages of Tribolium via FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements)-sequencing. Comparison of these profiles revealed a distinct set of open chromatin regions in each tissue and stage. Second, we established the first reporter assay system that works in both Drosophila and Tribolium, using nubbin in the wing and hunchback in the embryo as case studies. Together, these advances will be useful to study the evolution of cis-language and morphological diversity in Tribolium and other insects.


Sign in / Sign up

Export Citation Format

Share Document