Interval Trees for Detection of Overlapping Genetic Entities

Author(s):  
Fahim Mohammad ◽  
Robert M. Flight ◽  
Benjamin J. Harrison ◽  
Jeffrey C. Petruska ◽  
Eric C. Rouchka
Keyword(s):  
2019 ◽  
Vol 35 (23) ◽  
pp. 4907-4911 ◽  
Author(s):  
Jianglin Feng ◽  
Aakrosh Ratan ◽  
Nathan C Sheffield

Abstract Motivation Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary. Results We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5–18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4–60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis. Availability and implementation An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. Supplementary information Supplementary data are available at Bioinformatics online.


2004 ◽  
Vol 287 (1-3) ◽  
pp. 45-53 ◽  
Author(s):  
Mehri Javanian ◽  
Hosam Mahmoud ◽  
Mohammad Vahidi-Asl
Keyword(s):  

1997 ◽  
Vol 3 (2) ◽  
pp. 158-170 ◽  
Author(s):  
P. Cignoni ◽  
P. Marino ◽  
C. Montani ◽  
E. Puppo ◽  
R. Scopigno

2011 ◽  
Vol 03 (03) ◽  
pp. 369-392 ◽  
Author(s):  
MATHILDE BOUVEL ◽  
CEDRIC CHAUVE ◽  
MARNI MISHNA ◽  
DOMINIQUE ROSSIN

Perfect sorting by reversals, a problem originating in computational genomics, is the process of sorting a signed permutation to either the identity or to the reversed identity permutation, by a sequence of reversals that do not break any common interval. Bérard et al. (2007) make use of strong interval trees to describe an algorithm for sorting signed permutations by reversals. Combinatorial properties of this family of trees are essential to the algorithm analysis. Here, we use the expected value of certain tree parameters to prove that the average run-time of the algorithm is at worst, polynomial, and additionally, for sufficiently long permutations, the sorting algorithm runs in polynomial time with probability one. Furthermore, our analysis of the subclass of commuting scenarios yields precise results on the average length of a reversal, and the average number of reversals.


2003 ◽  
Vol 40 (03) ◽  
pp. 654-670 ◽  
Author(s):  
Yoshiaki Itoh ◽  
Hosam M. Mahmoud

The binary interval tree is a random structure that underlies interval division and parking problems. Five incomplete one-sided variants of binary interval trees are considered, providing additional flavors and variations on the main applications. The size of each variant is studied, and a Gaussian tendency is proved in each case via an analytic approach. Differential equations on half scale and delayed differential equations arise and can be solved asymptotically by local expansions and Tauberian theorems. Unlike the binary case, in an incomplete interval tree the size determines most other parameters of interest, such as the height or the internal path length.


1994 ◽  
Vol 04 (04) ◽  
pp. 475-481 ◽  
Author(s):  
REUVEN BAR-YEHUDA ◽  
BERNARD CHAZELLE

Recent advances on polygon triangulation have yielded efficient algorithms for a large number of problems dealing with a single simple polygon. If the input consists of several disjoint polygons, however, it is often desirable to merge them in preprocessing so as to produce a single polygon that retains the geometric characteristics of its individual components. We give an efficient method for doing so, which combines a generalized form of Jordan sorting with the efficient use of point location and interval trees. As a corollary, we are able to triangulate a collection of p disjoint Jordan polygonal chains in time O (n + p ( log p)1+ε), for any fixed ε > 0, where n is the total number of vertices. A variant of the algorithm gives a running time of O ((n + p log p) log log p). The performance of these solutions approaches the lower bound of Ω (n + p log p).


2004 ◽  
Vol 53 (12) ◽  
pp. 1615-1628 ◽  
Author(s):  
Haibin Lu ◽  
S. Sahni
Keyword(s):  

2013 ◽  
Vol DMTCS Proceedings vol. AS,... (Proceedings) ◽  
Author(s):  
Mathilde Bouvel ◽  
Marni Mishna ◽  
Cyril Nicaud

International audience After extending classical results on simple varieties of trees to trees counted by their number of leaves, we describe a filtration of the set of permutations based on their strong interval trees. For each subclass we provide asymptotic formulas for number of trees (by leaves), average number of nodes of fixed arity, average subtree size sum, and average number of internal nodes. The filtration is motivated by genome comparison of related species. Nous commençons par étendre les résultats classiques sur les variétés simples d'arbres aux arbres comptés selon leur nombre de feuilles, puis nous décrivons une filtration de l'ensemble des permutations qui repose sur leurs arbres des intervalles communs. Pour toute sous-classe, nous donnons des formules asymptotiques pour le nombre d'arbres (comptés selon les feuilles), le nombre moyen de nœuds d'arité fixée, la moyenne de la somme des tailles des sous-arbres, et le nombre moyen de nœuds internes. Cette filtration est motivée par des problématiques de comparaison de génomes.


Sign in / Sign up

Export Citation Format

Share Document