scholarly journals The PARA-suite: PAR-CLIP specific sequence read simulation and processing

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2619 ◽  
Author(s):  
Andreas Kloetgen ◽  
Arndt Borkhardt ◽  
Jessica I. Hoell ◽  
Alice C. McHardy

BackgroundNext-generation sequencing technologies have profoundly impacted biology over recent years. Experimental protocols, such as photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP), which identifies protein–RNA interactions on a genome-wide scale, commonly employ deep sequencing. With PAR-CLIP, the incorporation of photoactivatable nucleosides into nascent transcripts leads to high rates of specific nucleotide conversions during reverse transcription. So far, the specific properties of PAR-CLIP-derived sequencing reads have not been assessed in depth.MethodsWe here compared PAR-CLIP sequencing reads to regular transcriptome sequencing reads (RNA-Seq) to identify distinctive properties that are relevant for reference-based read alignment of PAR-CLIP datasets. We developed a set of freely available tools for PAR-CLIP data analysis, called the PAR-CLIP analyzer suite (PARA-suite). The PARA-suite includes error model inference, PAR-CLIP read simulation based on PAR-CLIP specific properties, a full read alignment pipeline with a modified Burrows–Wheeler Aligner algorithm and CLIP read clustering for binding site detection.ResultsWe show that differences in the error profiles of PAR-CLIP reads relative to regular transcriptome sequencing reads (RNA-Seq) make a distinct processing advantageous. We examine the alignment accuracy of commonly applied read aligners on 10 simulated PAR-CLIP datasets using different parameter settings and identified the most accurate setup among those read aligners. We demonstrate the performance of the PARA-suite in conjunction with different binding site detection algorithms on several real PAR-CLIP and HITS-CLIP datasets. Our processing pipeline allowed the improvement of both alignment and binding site detection accuracy.AvailabilityThe PARA-suite toolkit and the PARA-suite aligner are available athttps://github.com/akloetgen/PARA-suiteandhttps://github.com/akloetgen/PARA-suite_aligner, respectively, under the GNU GPLv3 license.

2017 ◽  
Vol 114 (52) ◽  
pp. 13685-13690 ◽  
Author(s):  
Howook Hwang ◽  
Fabian Dey ◽  
Donald Petrey ◽  
Barry Honig

We report a template-based method, LT-scanner, which scans the human proteome using protein structural alignment to identify proteins that are likely to bind ligands that are present in experimentally determined complexes. A scoring function that rapidly accounts for binding site similarities between the template and the proteins being scanned is a crucial feature of the method. The overall approach is first tested based on its ability to predict the residues on the surface of a protein that are likely to bind small-molecule ligands. The algorithm that we present, LBias, is shown to compare very favorably to existing algorithms for binding site residue prediction. LT-scanner’s performance is evaluated based on its ability to identify known targets of Food and Drug Administration (FDA)-approved drugs and it too proves to be highly effective. The specificity of the scoring function that we use is demonstrated by the ability of LT-scanner to identify the known targets of FDA-approved kinase inhibitors based on templates involving other kinases. Combining sequence with structural information further improves LT-scanner performance. The approach we describe is extendable to the more general problem of identifying binding partners of known ligands even if they do not appear in a structurally determined complex, although this will require the integration of methods that combine protein structure and chemical compound databases.


2021 ◽  
Author(s):  
Nicolas Eugenie ◽  
Yvan Zivanovic ◽  
Gaelle Lelandais ◽  
Genevieve Coste ◽  
Claire Bouthier de la Tour ◽  
...  

Numerous genes are overexpressed in the radioresistant bacterium Deinococcus radiodurans after exposure to radiation or prolonged desiccation. The DdrO and IrrE proteins play a major role in regulating the expression of approximately predicted twenty of these genes. The transcriptional repressor DdrO blocks the expression of these genes under normal growth conditions. After exposure to genotoxic agents, the IrrE metalloprotease cleaves DdrO and relieves gene repression. Bioinformatic analyzes showed that this mechanism seems to be conserved in several species of Deinococcus, but many questions remain as such the number of genes regulated by DdrO. Here, by RNA-seq and CHiP-seq assays performed at a genome-wide scale coupled with bioinformatic analyses, we show that, the DdrO regulon in D. radiodurans includes many other genes than those previously described. These results thus pave the way to better understand the radioresistance mechanisms encoded by this bacterium.


Author(s):  
A T Vivek ◽  
Shailesh Kumar

Abstract Plant transcriptome encompasses numerous endogenous, regulatory non-coding RNAs (ncRNAs) that play a major biological role in regulating key physiological mechanisms. While studies have shown that ncRNAs are extremely diverse and ubiquitous, the functions of the vast majority of ncRNAs are still unknown. With ever-increasing ncRNAs under study, it is essential to identify, categorize and annotate these ncRNAs on a genome-wide scale. The use of high-throughput RNA sequencing (RNA-seq) technologies provides a broader picture of the non-coding component of transcriptome, enabling the comprehensive identification and annotation of all major ncRNAs across samples. However, the detection of known and emerging class of ncRNAs from RNA-seq data demands complex computational methods owing to their unique as well as similar characteristics. Here, we discuss major plant endogenous, regulatory ncRNAs in an RNA sample followed by computational strategies applied to discover each class of ncRNAs using RNA-seq. We also provide a collection of relevant software packages and databases to present a comprehensive bioinformatics toolbox for plant ncRNA researchers. We assume that the discussions in this review will provide a rationale for the discovery of all major categories of plant ncRNAs.


Author(s):  
Lucile Broseus ◽  
Aubin Thomas ◽  
Andrew J. Oldfield ◽  
Dany Severac ◽  
Emeric Dubois ◽  
...  

ABSTRACTMotivationLong-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous “hybrid correction” algorithms have been developed for genomic data that correct long reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data.ResultsWe have created a novel reference-free algorithm called TALC (Transcription Aware Long Read Correction) which models changes in RNA expression and isoform representation in a weighted De-Bruijn graph to correct long reads from transcriptome studies. We show that transcription aware correction by TALC improves the accuracy of the whole spectrum of downstream RNA-seq applications and is thus necessary for transcriptome analyses that use long read technology.Availability and ImplementationTALC is implemented in C++ and available at https://gitlab.igh.cnrs.fr/lbroseus/[email protected]


2019 ◽  
Author(s):  
Hsin-Yen Larry Wu ◽  
Polly Yingshan Hsu

ABSTRACTBackgroundRibo-seq has revolutionized the study of mRNA translation in a genome-wide scale. High-quality Ribo-seq data display strong 3-nucleotide (nt) periodicity, which corresponds to translating ribosomes decipher three nucleotides each time. While the 3-nt periodicity has been widely used to study novel translation events and identify small open reading frames on presumed non-coding RNAs, tools which allow the visualization of those events remain underdeveloped.FindingsRiboPlotR is a visualization package written in R that presents both RNA-seq coverage and Ribo-seq reads for all annotated transcript isoforms in a context of a given gene. In particular, RiboPlotR plots Ribo-seq reads mapped in three reading frames using three colors for one isoform model at a time. Moreover, RiboPlotR shows Ribo-seq reads on upstream ORFs, 5’ and 3’ untranslated regions and introns, which is critical for observing new translation events and potential regulatory mechanisms.ConclusionsRiboPlotR is freely available (https://github.com/hsinyenwu/RiboPlotR) and allows the visualization of the translating features in Ribo-seq data.


2016 ◽  
Author(s):  
Avantika Lal ◽  
Sandeep Krishna ◽  
Aswin Sai Narain Seshasayee

ABSTRACTInEscherichia coli, the sigma factor σ70directs RNA polymerase to transcribe growth-related genes, while σ38directs transcription of stress response genes during stationary phase. Two molecules hypothesized to regulate RNA polymerase are the protein Rsd, which binds to σ70, and the non-coding 6S RNA which binds to the RNA polymerase- σ70holoenzyme. Despite multiple studies, the functions of Rsd and 6S RNA remain controversial. Here we use RNA-Seq in five phases of growth to elucidate their function on a genome-wide scale. We show for the first time that Rsd and 6S RNA facilitate σ38activity throughout bacterial growth, while 6S RNA also regulates widely different genes depending upon growth phase. We discover novel interactions between 6S RNA and Rsd and show widespread expression changes in a strain lacking both regulators. Finally, we present a mathematical model of transcription which highlights the crosstalk between Rsd and 6S RNA as a crucial factor in controlling sigma factor competition and global gene expression.


2020 ◽  
Vol 36 (20) ◽  
pp. 5000-5006 ◽  
Author(s):  
Lucile Broseus ◽  
Aubin Thomas ◽  
Andrew J Oldfield ◽  
Dany Severac ◽  
Emeric Dubois ◽  
...  

Abstract Motivation Long-read sequencing technologies are invaluable for determining complex RNA transcript architectures but are error-prone. Numerous ‘hybrid correction’ algorithms have been developed for genomic data that correct long reads by exploiting the accuracy and depth of short reads sequenced from the same sample. These algorithms are not suited for correcting more complex transcriptome sequencing data. Results We have created a novel reference-free algorithm called Transcript-level Aware Long-Read Correction (TALC) which models changes in RNA expression and isoform representation in a weighted De Bruijn graph to correct long reads from transcriptome studies. We show that transcript-level aware correction by TALC improves the accuracy of the whole spectrum of downstream RNA-seq applications and is thus necessary for transcriptome analyses that use long read technology. Availability and implementation TALC is implemented in C++ and available at https://github.com/lbroseus/TALC. Supplementary information Supplementary data are available at Bioinformatics online.


Genome ◽  
1999 ◽  
Vol 42 (4) ◽  
pp. 706-713 ◽  
Author(s):  
Concha Linares ◽  
Antonio Serna ◽  
Araceli Fominaya

A repetitive sequence, pAs17, was isolated from Avena strigosa (As genome) and characterized. The insert was 646 bp in length and showed 54% AT content. Databank searches revealed its high homology to the long terminal repeat (LTR) sequences of the specific family of Ty1-copia retrotransposons represented by WIS2-1A and Bare. It was also found to be 70% identical to the LTR domain of the WIS2-1A retroelement of wheat and 67% identical to the Bare-1 retroelement of barley. Southern hybridizations of pAs17 to diploid (A or C genomes), tetraploid (AC genomes), and hexaploid (ACD genomes) oat species revealed that it was absent in the C diploid species. Slot-blot analysis suggested that both diploid and tetraploid oat species contained 1.3 × 104 copies, indicating that they are a component of the A-genome chromosomes. The hexaploid species contained 2.4 × 104 copies, indicating that they are a component of both A- and D-genome chromosomes. This was confirmed by fluorescent in situ hybridization analyses using pAs17, two ribosomal sequences, and a C-genome specific sequence as probes. Further, the chromosomes involved in three C-A and three C-D intergenomic translocations in Avena murphyi (AC genomes) and Avena sativa cv. Extra Klock (ACD genomes), respectively, were identified. Based on its physical distribution and Southern hybridization patterns, a parental retrotransposon represented by pAs17 appears to have been active at least once during the evolution of the A genome in species of the Avena genus.Key words: chromosomal organization, in situ hybridization, intergenomic translocations, LTR sequence, oats.


Marine Drugs ◽  
2021 ◽  
Vol 19 (4) ◽  
pp. 202
Author(s):  
Rajesh Rajaian Pushpabai ◽  
Carlton Ranjith Wilson Alphonse ◽  
Rajasekar Mani ◽  
Deepak Arun Apte ◽  
Jayaseelan Benjamin Franklin

Marine cone snails are predatory gastropods characterized by a well-developed venom apparatus and highly evolved hunting strategies that utilize toxins to paralyze prey and defend against predators. The venom of each species of cone snail has a large number of pharmacologically active peptides known as conopeptides or conotoxins that are usually unique in each species. Nevertheless, venoms of only very few species have been characterized so far by transcriptomic approaches. In this study, we used transcriptome sequencing technologies and mass spectrometric methods to describe the diversity of venom components expressed by a worm-hunting species, Conus bayani. A total of 82 conotoxin sequences were retrieved from transcriptomic data that contain 54 validated conotoxin sequences clustered into 21 gene superfamilies including divergent gene family, 17 sequences clustered to 6 different conotoxin classes, and 11 conotoxins classified as unassigned gene family. Seven new conotoxin sequences showed unusual cysteine patterns. We were also able to identify 19 peptide sequences using mass spectrometry that completely overlapped with the conotoxin sequences obtained from transcriptome analysis. Importantly, herein we document the presence of 16 proteins that include five post-translational modifying enzymes obtained from transcriptomic data. Our results revealed diverse and novel conopeptides of an unexplored species that could be used extensively in biomedical research due to their therapeutic potentials.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1820
Author(s):  
Xiaotao Shao ◽  
Qing Wang ◽  
Wei Yang ◽  
Yun Chen ◽  
Yi Xie ◽  
...  

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.


Sign in / Sign up

Export Citation Format

Share Document