Semirna: Searching for Plant miRNAs Using Target Sequences

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SAUTE: sequence assembly using target enrichment

BMC Bioinformatics ◽

10.1186/s12859-021-04174-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Alexandre Souvorov ◽

Richa Agarwala

Keyword(s):

De Bruijn Graph ◽

Challenging Problem ◽

Target Enrichment ◽

Rna Seq ◽

Sequencing Technology ◽

Insert Size ◽

Systematic Biases ◽

Target Sequences ◽

Genomic Regions ◽

Higher Sensitivity

Abstract Background Illumina is the dominant sequencing technology at this time. Short length, short insert size, some systematic biases, and low-level carryover contamination in Illumina reads continue to make assembly of repeated regions a challenging problem. Some applications also require finding multiple well supported variants for assembled regions. Results To facilitate assembly of repeat regions and to report multiple well supported variants when a user can provide target sequences to assist the assembly, we propose SAUTE and SAUTE_PROT assemblers. Both assemblers use de Bruijn graph on reads. Targets can be transcripts or proteins for RNA-seq reads and transcripts, proteins, or genomic regions for genomic reads. Target sequences are nucleotide and protein sequences for SAUTE and SAUTE_PROT, respectively. Conclusions For RNA-seq, comparisons with Trinity, rnaSPAdes, SPAligner, and SPAdes assembly of reads aligned to target proteins by DIAMOND show that SAUTE_PROT finds more coding sequences that translate to benchmark proteins. Using AMRFinderPlus calls, we find SAUTE has higher sensitivity and precision than SPAdes, plasmidSPAdes, SPAligner, and SPAdes assembly of reads aligned to target regions by HISAT2. It also has better sensitivity than SKESA but worse precision.

Download Full-text

GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments

BMC Bioinformatics ◽

10.1186/s12859-021-04133-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Jeongmin Bae ◽

Hajin Jeon ◽

Min-Soo Kim

Keyword(s):

Data Structures ◽

Primer Design ◽

Main Memory ◽

Design Tools ◽

Workload Balancing ◽

Qpcr Analysis ◽

Computational Speed ◽

Entire Sequence ◽

Target Sequences ◽

Coalesced Memory

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.

Download Full-text

Physical linkage of mouse lambda genes by pulsed-field gel electrophoresis suggests that the rearrangement process favors proximate target sequences.

Molecular and Cellular Biology ◽

10.1128/mcb.9.2.711 ◽

1989 ◽

Vol 9 (2) ◽

pp. 711-718 ◽

Cited By ~ 44

Author(s):

U Storb ◽

D Haasch ◽

B Arp ◽

P Sanchez ◽

P A Cazenave ◽

...

Keyword(s):

Gene Order ◽

Gel Electrophoresis ◽

Gene Locus ◽

Pulsed Field Gel Electrophoresis ◽

Immunoglobulin Gene ◽

Related Gene ◽

Pulsed Field ◽

Physical Linkage ◽

Rearrangement Process ◽

Target Sequences

The first complete map of a mammalian immunoglobulin gene locus is presented. Mouse lambda genes were mapped by pulsed-field gel electrophoresis. The gene order is V2-Vx-C2-C4-V1-C3-C1. The distance between V2 or Vx and the C2-C4 cluster is 74 or 55 kilobases (kb), respectively, whereas that between V1 and C3-C1 is only 19 kb; V2 and C3-C1 are at least 190 kb apart. Thus, the distances between the lambda subloci are inversely proportional to their frequencies of rearrangement. The related gene lambda 5 is not within the 500 kb of the lambda locus mapped here.

Download Full-text

MicroRNA annotation in plants: current status and challenges

Briefings in Bioinformatics ◽

10.1093/bib/bbab075 ◽

2021 ◽

Author(s):

Yongxin Zhao ◽

Zheng Kuang ◽

Ying Wang ◽

Lei Li ◽

Xiaozeng Yang

Keyword(s):

Small Rna ◽

Signal To Noise Ratio ◽

Current Status ◽

Mirna Biogenesis ◽

Plant Mirnas ◽

Signal To Noise ◽

Advantages And Disadvantages ◽

Plant Mirna ◽

History Of ◽

Mirna Biogenesis Pathway

Abstract Last two decades, the studies on microRNAs (miRNAs) and the numbers of annotated miRNAs in plants and animals have surged. Herein, we reviewed the current progress and challenges of miRNA annotation in plants. Via the comparison of plant and animal miRNAs, we pinpointed out the difficulties on plant miRNA annotation and proposed potential solutions. In terms of recalling the history of methods and criteria in plant miRNA annotation, we detailed how the major progresses made and evolved. By collecting and categorizing bioinformatics tools for plant miRNA annotation, we surveyed their advantages and disadvantages, especially for ones with the principle of mimicking the miRNA biogenesis pathway by parsing deeply sequenced small RNA (sRNA) libraries. In addition, we summarized all available databases hosting plant miRNAs, and posted the potential optimization solutions such as how to increase the signal-to-noise ratio (SNR) in these databases. Finally, we discussed the challenges and perspectives of plant miRNA annotations, and indicated the possibilities offered by an all-in-one tool and platform according to the integration of artificial intelligence.

Download Full-text

Target sequences for hunchback in a control region conferring Ultrabithorax expression boundaries

Development ◽

10.1242/dev.113.4.1171 ◽

1991 ◽

Vol 113 (4) ◽

pp. 1171-1179 ◽

Cited By ~ 11

Author(s):

C.C. Zhang ◽

J. Muller ◽

M. Hoch ◽

H. Jackle ◽

M. Bienz

Keyword(s):

Long Range ◽

Control Region ◽

Binding Sites ◽

Regulatory Sequences ◽

Stripe Pattern ◽

Anteroposterior Axis ◽

Target Sequences ◽

Beta Galactosidase ◽

Control Regions

Boundaries of Ultrabithorax expression are mediated by long-range repression acting through the PBX or ABX control region. We show here that either of these control regions confers an early band of beta-galactosidase expression which is restricted along the anteroposterior axis of the blastoderm embryo. This band is succeeded by a stripe pattern with very similar anteroposterior limits. Dissection of the PBX control region demonstrates that the two patterns are conferred by distinct cis-regulatory sequences contained within separate PBX subfragments. We find several binding sites for hunchback protein within both PBX subfragments. Zygotic hunchback function is required to prevent ectopic PBX expression. Moreover, the PBX pattern is completely suppressed in embryos containing uniformly distributed maternal hunchback protein. Our results strongly suggest that hunchback protein directly binds to the PBX control region and acts as a repressor to specify the boundary positions of the PBX pattern.

Download Full-text