scholarly journals TransPrise: a novel machine learning approach for eukaryotic promoter prediction

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e7990 ◽  
Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Aleksei Zarubin ◽  
Dmitry Sokolov ◽  
Duane R. Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise—an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. Our pipeline consists of two parts: the binary classifier operates the first, and if a sequence is classified as TSS-containing the regression step follows, where the precise location of TSS is being identified. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise classification and regression models with the TSSPlant approach for the well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. The Matthews correlation coefficient value for TransPrise is 0.79, more than two times larger than the 0.31 for TSSPlant classification models. This represents a high level of prediction accuracy. Additionally, the mean absolute error for the regression model is 29.19 nt, allowing for accurate prediction of TSS location. TransPrise was also tested in Homo sapiens, where mean absolute error of the regression model was 47.986 nt. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at (http://compubioverne.group/). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.

2019 ◽  
Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Alexei Zarubin ◽  
Dmitry Sokolov ◽  
Duane Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.


2019 ◽  
Author(s):  
Stepan Pachganov ◽  
Khalimat Murtazalieva ◽  
Alexei Zarubin ◽  
Dmitry Sokolov ◽  
Duane Chartier ◽  
...  

As interest in genetic resequencing increases, so does the need for effective mathematical, computational, and statistical approaches. One of the difficult problems in genome annotation is determination of precise positions of transcription start sites. In this paper we present TransPrise - an efficient deep learning tool for prediction of positions of eukaryotic transcription start sites. TransPrise offers significant improvement over existing promoter-prediction methods. To illustrate this, we compared predictions of TransPrise with the TSSPlant approach for well annotated genome of Oryza sativa. Using a computer equipped with a graphics processing unit, the run time of TransPrise is 250 minutes on a genome of 374 Mb long. We provide the full basis for the comparison and encourage users to freely access a set of our computational tools to facilitate and streamline their own analyses. The ready-to-use Docker image with all necessary packages, models, code as well as the source code of the TransPrise algorithm are available at ( http://compubioverne.group /). The source code is ready to use and customizable to predict TSS in any eukaryotic organism.


2019 ◽  
Author(s):  
Bo Yan ◽  
George Tzertzinis ◽  
Ira Schildkraut ◽  
Laurence Ettwiller

AbstractMethodologies for determining eukaryotic Transcription Start Sites (TSS) rely on the selection of the 5’ canonical cap structure of Pol-II transcripts and are consequently ignoring entire classes of TSS derived from other RNA polymerases which play critical roles in various cell functions. To overcome this limitation, we developed ReCappable-seq and identified TSS from Pol-ll and non-Pol-II transcripts at nucleotide resolution. Applied to the human transcriptome, ReCappable-seq identifies Pol-II TSS with higher specificity than CAGE and reveals a rich landscape of TSS associated notably with Pol-III transcripts which have been previously not possible to study on a genome-wide scale. Novel TSS consistent with non-Pol-II transcripts can be found in the nuclear and mitochondrial genomes. By identifying TSS derived from all RNA-polymerases, ReCappable-seq reveals distinct epigenetic marks among Pol-lI and non-Pol-II TSS and provides a unique opportunity to concurrently interrogate the regulatory landscape of coding and non-coding RNA.


2021 ◽  
Author(s):  
Juexiao Zhou ◽  
bin zhang ◽  
Haoyang Li ◽  
Longxi Zhou ◽  
Zhongxiao Li ◽  
...  

Abstract The accurate annotation of transcription start sites (TSSs) and their usage is critical for the mechanistic understanding of gene regulation under different biological contexts. To fulfil this, on one hand, specific high-throughput experimental technologies have been developed to capture TSSs in a genome-wide manner. On the other hand, various computational tools have also been developed for in silico prediction of TSSs solely based on genomic sequences. Most of these computational tools cast the problem as a binary classification task on a balanced dataset and thus result in drastic false positive predictions when applied on the genome-scale. To address these issues, we present DeeReCT-TSS, a deep-learning-based method that is capable of TSSs identification across the whole genome based on both DNA sequences and conventional RNA-seq data. We show that by effectively incorporating these two sources of information, DeeReCT-TSS significantly outperforms other solely sequence-based methods on the precise annotation of TSSs used in different cell types. Furthermore, we develop a meta-learning-based extension for simultaneous transcription start site (TSS) annotation on 10 cell types, which enables the identification of cell-type-specific TSS. Finally, we demonstrate the high precision of DeeReCT-TSS on two independent datasets from the ENCODE project by correlating our predicted TSSs with experimentally defined TSS chromatin states. Our application, pre-trained models and data are available at https://github.com/JoshuaChou2018/DeeReCT-TSS_release.


Author(s):  
I. A. Shahmuradov

Aim. The computational search for promoters remains an attractive problem in bioinformatics. Despite the attention it has received for many years, the problem has not been addressed satisfactorily. These studies were aimed to develop novel computer tools for prediction of promoters (transcription start sites, TSSs) in plants and bacteria. Results. Two novel tools for prediction of RNA polymerase II promoters in plants (TSSPlant) and bacteria (bTSSfinder) have been developed. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA and TATA-less promoters; it is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/. bTSSfinder predicts promoters for five classes of σ factors in Cyanobacteria (σA, σC, σH, σG and σF) and for five classes of sigma factors in E. coli (σ70, σ38, σ32, σ28 and σ24). Comparing to currently available tools, bTSSfinder achieves highest accuracy. bTSSfinder is available standalone and online at http://www.cbrc.kaust.edu.sa/btssfinder. Conclusions. To date, TSSPlant and bTSSfinder are most accurate promoter predictors in plants and bacteria, respectively. Keywords: transcription, RNA polymerase, promoter, TSS, promoter prediction.


2020 ◽  
Vol 15 ◽  
Author(s):  
Fahad Layth Malallah ◽  
Baraa T. Shareef ◽  
Mustafah Ghanem Saeed ◽  
Khaled N. Yasen

Aims: Normally, the temperature increase of individuals leads to the possibility of getting a type of disease, which might be risky to other people such as coronavirus. Traditional techniques for tracking core-temperature require body contact either by oral, rectum, axillary, or tympanic, which are unfortunately considered intrusive in nature as well as causes of contagion. Therefore, sensing human core-temperature non-intrusively and remotely is the objective of this research. Background: Nowadays, increasing level of medical sectors is a necessary targets for the research operations, especially with the development of the integrated circuit, sensors and cameras that made the normal life easier. Methods: The solution is by proposing an embedded system consisting of the Arduino microcontroller, which is trained with a model of Mean Absolute Error (MAE) analysis for predicting Contactless Core-Temperature (CCT), which is the real body temperature. Results: The Arduino is connected to an Infrared-Thermal sensor named MLX90614 as input signal, and connected to the LCD to display the CCT. To evaluate the proposed system, experiments are conducted by participating 31-subject sensing contactless temperature from the three face sub-regions: forehead, nose, and cheek. Conclusion: Experimental results approved that CCT can be measured remotely depending on the human face, in which the forehead region is better to be dependent, rather than nose and cheek regions for CCT measurement due to the smallest


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2670
Author(s):  
Thomas Quirin ◽  
Corentin Féry ◽  
Dorian Vogel ◽  
Céline Vergne ◽  
Mathieu Sarracanie ◽  
...  

This paper presents a tracking system using magnetometers, possibly integrable in a deep brain stimulation (DBS) electrode. DBS is a treatment for movement disorders where the position of the implant is of prime importance. Positioning challenges during the surgery could be addressed thanks to a magnetic tracking. The system proposed in this paper, complementary to existing procedures, has been designed to bridge preoperative clinical imaging with DBS surgery, allowing the surgeon to increase his/her control on the implantation trajectory. Here the magnetic source required for tracking consists of three coils, and is experimentally mapped. This mapping has been performed with an in-house three-dimensional magnetic camera. The system demonstrates how magnetometers integrated directly at the tip of a DBS electrode, might improve treatment by monitoring the position during and after the surgery. The three-dimensional operation without line of sight has been demonstrated using a reference obtained with magnetic resonance imaging (MRI) of a simplified brain model. We observed experimentally a mean absolute error of 1.35 mm and an Euclidean error of 3.07 mm. Several areas of improvement to target errors below 1 mm are also discussed.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3719
Author(s):  
Aoxin Ni ◽  
Arian Azarang ◽  
Nasser Kehtarnavaz

The interest in contactless or remote heart rate measurement has been steadily growing in healthcare and sports applications. Contactless methods involve the utilization of a video camera and image processing algorithms. Recently, deep learning methods have been used to improve the performance of conventional contactless methods for heart rate measurement. After providing a review of the related literature, a comparison of the deep learning methods whose codes are publicly available is conducted in this paper. The public domain UBFC dataset is used to compare the performance of these deep learning methods for heart rate measurement. The results obtained show that the deep learning method PhysNet generates the best heart rate measurement outcome among these methods, with a mean absolute error value of 2.57 beats per minute and a mean square error value of 7.56 beats per minute.


Sign in / Sign up

Export Citation Format

Share Document