scholarly journals Indexing labeled sequences

2018 ◽  
Vol 4 ◽  
pp. e148
Author(s):  
Tatiana Rocher ◽  
Mathieu Giraud ◽  
Mikaël Salson

Background Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies. Methods We present two indexes for a text with non-overlapping labels. They store the text in a Burrows–Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TLBW-index). Both indexes need a space related to the entropy of the labeled text. Results These indexes allow efficient text–label queries to count and find labeled patterns. The TLBW-index has an overhead on simple label queries but is very efficient on combined pattern–label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. Discussion New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.

2014 ◽  
Vol 25 (08) ◽  
pp. 1161-1175 ◽  
Author(s):  
SILVIA BONOMO ◽  
SABRINA MANTACI ◽  
ANTONIO RESTIVO ◽  
GIOVANNA ROSONE ◽  
MARINELLA SCIORTINO

In this paper we are interested in the study of the combinatorial aspects related to the extension of the Burrows-Wheeler transform to a multiset of words. Such study involves the notion of suffixes and conjugates of words and is based on two different order relations, denoted by <lex and ≺ω, that, even if strictly connected, are quite different from the computational point of view. In particular, we introduce a method that only uses the <lex sorting among suffixes of a multiset of words in order to sort their conjugates according to ≺ω-order. In this study an important role is played by Lyndon words. This strategy could be used in applications specially in the field of Bioinformatics, where for instance the advent of “next-generation” DNA sequencing technologies has meant that huge collections of DNA sequences are now commonplace.


2019 ◽  
Author(s):  
Shangjie Zou

AbstractBackgroundIn organisms’ genomes, promoters are short DNA sequences on the upstream of structural genes, with the function of controlling genes’ transcription. Promoters can be roughly divided into two classes: constitutive promoters and inducible promoters. Promoters with clear functional annotations are practical synthetic biology biobricks. Many statistical and machine learning methods have been introduced to predict the functions of candidate promoters. Spectral Eigenmap has been proved to be an effective clustering method to classify biobricks, while support vector machine (SVM) is a powerful machine learning algorithm, especially when dataset is small.MethodsThe two algorithms: spectral embedding and SVM are applied to the same dataset with 375 prokaryotic promoters. For spectral embedding, a Laplacian matrix is built with edit distance, followed by K-Means Clustering. The sequences are represented by numeric vector to serve as dataset for SVM trainning.ResultsSVM achieved a high predicting accuracy of 93.07% in 10-fold cross validation for classification of promoters’ transcriptional functions. Laplacian eigenmap (spectral embedding) based on editing distance may not be capable for extracting discriminative features for this task.AvailabilityCodes, datasets and some important matrices are available on github https://github.com/shangjieZou/Promoter-transcriptional-predictor/tree/source-code


2008 ◽  
Vol 06 (02) ◽  
pp. 403-413 ◽  
Author(s):  
RAFAL POKRZYWA

The explosive growth in biological data in recent years has led to the development of new methods to identify DNA sequences. Many algorithms have recently been developed that search DNA sequences looking for unique DNA sequences. This paper considers the application of the Burrows–Wheeler transform (BWT) to the problem of unique DNA sequence identification. The BWT transforms a block of data into a format that is extremely well suited for compression. This paper presents a time-efficient algorithm to search for unique DNA sequences in a set of genes. This algorithm is applicable to the identification of yeast species and other DNA sequence sets.


2018 ◽  
Author(s):  
Vikas Yadav ◽  
Fan Yang ◽  
Md. Hashim Reza ◽  
Sanzhen Liu ◽  
Barbara Valent ◽  
...  

AbstractA series of well-synchronized events mediated by kinetochore-microtubule interactions ensure faithful chromosome segregation in eukaryotes. Centromeres scaffold kinetochore assembly and are among the fastest evolving chromosomal loci in terms of the DNA sequence, length, and organization of intrinsic elements. Neither the centromere structure nor the kinetochore dynamics is well studied in plant pathogenic fungi. Here, we sought to understand the process of chromosome segregation in the rice blast fungus, Magnaporthe oryzae. High-resolution confocal imaging of GFP-tagged inner kinetochore proteins, CenpA and CenpC, revealed an unusual albeit transient declustering of centromeres just before anaphase separation in M. oryzae. Strikingly, the declustered centromeres positioned randomly at the spindle midzone without an apparent metaphase plate per se. Using chromatin immunoprecipitation followed by deep sequencing, all seven centromeres were identified as CenpA-rich regions in the wild-type Guy11 strain of M. oryzae. The centromeres in M. oryzae are regional and span 57 to 109 kb transcriptionally poor regions. No centromere-specific DNA sequence motif or repetitive elements could be identified in these regions suggesting an epigenetic specification of centromere function in M. oryzae. Highly AT-rich and heavily methylated DNA sequences were the only common defining features of all the centromeres in rice blast fungus. PacBio genome assemblies and synteny analyses facilitated comparison of the centromere regions in distinct isolate(s) of rice blast, wheat blast, and in M. poae. Overall, this study identified unusual centromere dynamics and precisely mapped the centromere DNA sequences in the top model fungal pathogens that belong to the Magnaporthales and cause severe losses to global production of food crops and turf grasses.Author summaryMagnaporthe oryzae is an important fungal pathogen that causes an annual loss of 10-30% of the rice crop due to the devastating blast disease. In most organisms, kinetochores are arranged either in the metaphase plate or are clustered together to facilitate synchronized anaphase separation of chromosomes. In this study, we show that the initially clustered kinetochores separate and position randomly prior to anaphase in M. oryzae. Centromeres, identified as the site of kinetochore assembly, are regional type without any shared sequence motifs in M. oryzae. Together, this study reveals atypical kinetochore dynamics and identifies functional centromeres in M. oryzae, thus paving the way to define heterochromatin boundaries and understand the process of kinetochore assembly on epigenetically specified centromere loci in the economically important cereal blast and summer patch pathogens. This study paves the way for understanding the contribution of heterochromatin in genome stability and virulence of the blast fungus.


2018 ◽  
Vol 41 ◽  
Author(s):  
Maria Babińska ◽  
Michal Bilewicz

AbstractThe problem of extended fusion and identification can be approached from a diachronic perspective. Based on our own research, as well as findings from the fields of social, political, and clinical psychology, we argue that the way contemporary emotional events shape local fusion is similar to the way in which historical experiences shape extended fusion. We propose a reciprocal process in which historical events shape contemporary identities, whereas contemporary identities shape interpretations of past traumas.


Sign in / Sign up

Export Citation Format

Share Document