scholarly journals Permutation Tableaux and the Dashed Permutation Pattern 32–1

10.37236/598 ◽  
2011 ◽  
Vol 18 (1) ◽  
Author(s):  
William Y.C. Chen ◽  
Lewis H. Liu

We give a solution to a problem posed by Corteel and Nadeau concerning permutation tableaux of length $n$ and the number of occurrences of the dashed pattern 32–1 in permutations on $[n]$. We introduce the inversion number of a permutation tableau. For a permutation tableau $T$ and the permutation $\pi$ obtained from $T$ by the bijection of Corteel and Nadeau, we show that the inversion number of $T$ equals the number of occurrences of the dashed pattern 32–1 in the reverse complement of $\pi$. We also show that permutation tableaux without inversions coincide with L-Bell tableaux introduced by Corteel and Nadeau.

2021 ◽  
Author(s):  
Kristoffer Sahlin

Short-read genome alignment is a fundamental computational step used in many bioinformatic analyses. It is therefore desirable to align such data as fast as possible. Most alignment algorithms consider a seed-and-extend approach. Several popular programs perform the seeding step based on the Burrows-Wheeler Transform with a low memory footprint, but they are relatively slow compared to more recent approaches that use a minimizer-based seeding-and-chaining strategy. Recently, syncmers and strobemers were proposed for sequence comparison. Both protocols were designed for improved conservation of matches between sequences under mutations. Syncmers is a thinning protocol proposed as an alternative to minimizers, while strobemers is a linking protocol for gapped sequences and was proposed as an alternative to k-mers. The main contribution in this work is a new seeding approach that combines syncmers and strobemers. We use a strobemer protocol (randstrobes) to link together syncmers (i.e., in syncmer-space) instead of over the original sequence. Our protocol allows us to create longer seeds while preserving mapping accuracy. A longer seed length reduces the number of candidate regions which allows faster mapping and alignment. We also contribute the insight that speed-wise, this protocol is particularly effective when syncmers are canonical. Canonical syncmers can be created for specific parameter combinations and reduce the computational burden of computing the non-canonical randstrobes in reverse complement. We implement our idea in a proof-of-concept short-read aligner strobealign that aligns short reads 3-4x faster than minimap2 and 15-23x faster than BWA and Bowtie2. Many implementation versions of, e.g., BWA, achieve high speed on specific hardware. Our contribution is algorithmic and requires no hardware architecture or system-specific instructions. Strobealign is available at https://github.com/ksahlin/StrobeAlign.


10.37236/1622 ◽  
2002 ◽  
Vol 9 (1) ◽  
Author(s):  
M. H. Albert ◽  
M. D. Atkinson ◽  
C. C. Handley ◽  
D. A. Holton ◽  
W. Stromquist

The density of a permutation pattern $\pi$ in a permutation $\sigma$ is the proportion of subsequences of $\sigma$ of length $|\pi|$ that are isomorphic to $\pi$. The maximal value of the density is found for several patterns $\pi$, and asymptotic upper and lower bounds for the maximal density are found in several other cases. The results are generalised to sets of patterns and the maximum density is found for all sets of length $3$ patterns.


2019 ◽  
Author(s):  
Joseph L. DeRisi ◽  
Greg Huber ◽  
Amy Kistler ◽  
Hanna Retallack ◽  
Michael Wilkinson ◽  
...  

ABSTRACTNarnaviruses have been described as positive-sense RNA viruses with a remarkably simple genome of ∼ 3 kb, encoding only a highly conserved RNA-dependent RNA polymerase (RdRp). Many narnaviruses, however, are ‘ambigrammatic’ and harbour an additional uninterrupted open reading frame (ORF) covering almost the entire length of the reverse complement strand. No function has been described for this ORF, yet the absence of stops is conserved across diverse narnaviruses, and in every case the codons in the reverse ORF and the RdRp are aligned. The > 3 kb ORF overlap on opposite strands, unprecedented among RNA viruses, motivates an exploration of the constraints imposed or alleviated by the codon alignment. Here, we show that only when the codon frames are aligned can all stop codons be eliminated from the reverse strand by synonymous single-nucleotide substitutions in the RdRp gene, suggesting a mechanism for de novo gene creation within a strongly conserved amino-acid sequence. It will be fascinating to explore what implications this coding strategy has for other aspects of narnavirus biology. Beyond narnaviruses, our rapidly expanding catalogue of viral diversity may yet reveal additional examples of this broadly-extensible principle for ambigrammatic-sequence development.


Author(s):  
Dong Xu ◽  
Zhuchou Lu ◽  
Kangming Jin ◽  
Wenmin Qiu ◽  
Guirong Qiao ◽  
...  

AbstractEfficiently extracting information from biological big data can be a huge challenge for people (especially those who lack programming skills). We developed Sequence Processing and Data Extraction (SPDE) as an integrated tool for sequence processing and data extraction for gene family and omics analyses. Currently, SPDE has seven modules comprising 100 basic functions that range from single gene processing (e.g., translation, reverse complement, and primer design) to genome information extraction. All SPDE functions can be used without the need for programming or command lines. The SPDE interface has enough prompt information to help users run SPDE without barriers. In addition to its own functions, SPDE also incorporates the publicly available analyses tools (such as, NCBI-blast, HMMER, Primer3 and SAMtools), thereby making SPDE a comprehensive bioinformatics platform for big biological data analysis.AvailabilitySPDE was built using Python and can be run on 32-bit, 64-bit Windows and macOS systems. It is an open-source software that can be downloaded from https://github.com/simon19891216/[email protected]


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Joseph L. DeRisi ◽  
Greg Huber ◽  
Amy Kistler ◽  
Hanna Retallack ◽  
Michael Wilkinson ◽  
...  

AbstractNarnaviruses have been described as positive-sense RNA viruses with a remarkably simple genome of ~3 kb, encoding only a highly conserved RNA-dependent RNA polymerase (RdRp). Many narnaviruses, however, are ‘ambigrammatic’ and harbour an additional uninterrupted open reading frame (ORF) covering almost the entire length of the reverse complement strand. No function has been described for this ORF, yet the absence of stops is conserved across diverse narnaviruses, and in every case the codons in the reverse ORF and the RdRp are aligned. The >3 kb ORF overlap on opposite strands, unprecedented among RNA viruses, motivates an exploration of the constraints imposed or alleviated by the codon alignment. Here, we show that only when the codon frames are aligned can all stop codons be eliminated from the reverse strand by synonymous single-nucleotide substitutions in the RdRp gene, suggesting a mechanism for de novo gene creation within a strongly conserved amino-acid sequence. It will be fascinating to explore what implications this coding strategy has for other aspects of narnavirus biology. Beyond narnaviruses, our rapidly expanding catalogue of viral diversity may yet reveal additional examples of this broadly-extensible principle for ambigrammatic-sequence development.


1980 ◽  
Vol 11 (11) ◽  
Author(s):  
J. A. BARLTROP ◽  
J. C. BARRETT ◽  
R. W. CARDER ◽  
A. C. DAY ◽  
J. R. HARDING ◽  
...  

2002 ◽  
Vol 10 (04) ◽  
pp. 319-335
Author(s):  
DAVID DIGBY ◽  
WILLIAM SEFFENS ◽  
FISSEHA ABEBE

An in silico study of mRNA secondary structure has found a bias within the coding sequences of genes that favors "in-frame" pairing of nucleotides. This pairing of codons, each with its reverse-complement, partitions the 20 amino acids into three subsets. The genetic code can therefore be represented by a three-component graph. The composition of proteins in terms of amino acid membership in the three subgroups has been measured, and sequence runs of members within the same subgroup have been analyzed using a runs statistic based on Z-scores. In a GENBANK database of over 416,000 protein sequences, the distribution of this runs-test statistic is negatively skewed. To assess whether this statistical bias was due to a chance grouping of the amino acids in the real genetic code, several alternate partitions of the genetic code were examined by permuting the assignment of amino acids to groups. A metric was constructed to define the difference, or "distance", between any two such partitions, and an exhaustive search was conducted among alternate partitions maximally distant from the natural partition of the genetic code, to select sets of partitions that were also maximally distant from one another. The statistical skewness of the runs statistic distribution for native protein sequences were significantly more negative under the natural partition than they were under all of the maximally different partition of codons, although for all partitions, including the natural one, the randomized sequences had quite similar skewness. Hence under the natural graph theory partition of the genetic code there is a preference for more protein sequences to contain fewer runs of amino acids, than they do under the other partitions, meaning that the average run must be longer under the natural partition. This suggests that a corresponding bias may exist in the coding sequences of the actual genes that code for these proteins.


Sign in / Sign up

Export Citation Format

Share Document