scholarly journals Sequence Alignment on Directed Graphs

2017 ◽  
Author(s):  
Kavya Vaddadi ◽  
Naveen Sivadasan ◽  
Kshitij Tayal ◽  
Rajgopal Srinivasan

AbstractGenomic variations in a reference collection are naturally represented as genome variation graphs. Such graphs encode common subsequences as vertices and the variations are captured using additional vertices and directed edges. The resulting graphs are directed graphs possibly with cycles. Existing algorithms for aligning sequences on such graphs make use of partial order alignment (POA) techniques that work on directed acyclic graphs (DAG). For this, acyclic extensions of the input graphs are first constructed through expensive loop unrolling steps (DAGification). Also, such graph extensions could have considerable blow up in their size and in the worst case the blow up factor is proportional to the input sequence length. We provide a novel alignment algorithm V-ALIGN that aligns the input sequence directly on the input graph while avoiding such expensive DAGification steps. V-ALIGN is based on a novel dynamic programming formulation that allows gapped alignment directly on the input graph. It supports affine and linear gaps. We also propose refinements to V-ALIGN for better performance in practice. In this, the time to fill the DP table has linear dependence on the sizes of the sequence, the graph and its feedback vertex set. We perform experiments to compare against the POA based alignment. For aligning short sequences, standard approaches restrict the expensive gapped alignment to small filtered subgraphs having high ‘similarity’ to the input sequence. In such cases, the performance of V-ALIGN for gapped alignment on the filtered subgraph depends on the subgraph sizes.

Algorithmica ◽  
2021 ◽  
Author(s):  
Robert Ganian ◽  
Sebastian Ordyniak ◽  
M. S. Ramanujan

AbstractIn this paper we revisit the classical edge disjoint paths (EDP) problem, where one is given an undirected graph G and a set of terminal pairs P and asks whether G contains a set of pairwise edge-disjoint paths connecting every terminal pair in P. Our focus lies on structural parameterizations for the problem that allow for efficient (polynomial-time or FPT) algorithms. As our first result, we answer an open question stated in Fleszar et al. (Proceedings of the ESA, 2016), by showing that the problem can be solved in polynomial time if the input graph has a feedback vertex set of size one. We also show that EDP parameterized by the treewidth and the maximum degree of the input graph is fixed-parameter tractable. Having developed two novel algorithms for EDP using structural restrictions on the input graph, we then turn our attention towards the augmented graph, i.e., the graph obtained from the input graph after adding one edge between every terminal pair. In constrast to the input graph, where EDP is known to remain -hard even for treewidth two, a result by Zhou et al. (Algorithmica 26(1):3--30, 2000) shows that EDP can be solved in non-uniform polynomial time if the augmented graph has constant treewidth; we note that the possible improvement of this result to an FPT-algorithm has remained open since then. We show that this is highly unlikely by establishing the [1]-hardness of the problem parameterized by the treewidth (and even feedback vertex set) of the augmented graph. Finally, we develop an FPT-algorithm for EDP by exploiting a novel structural parameter of the augmented graph.


Algorithmica ◽  
2021 ◽  
Author(s):  
Fedor V. Fomin ◽  
Petr A. Golovach ◽  
William Lochet ◽  
Pranabendu Misra ◽  
Saket Saurabh ◽  
...  

AbstractWe initiate the parameterized complexity study of minimum t-spanner problems on directed graphs. For a positive integer t, a multiplicative t-spanner of a (directed) graph G is a spanning subgraph H such that the distance between any two vertices in H is at most t times the distance between these vertices in G, that is, H keeps the distances in G up to the distortion (or stretch) factor t. An additive t-spanner is defined as a spanning subgraph that keeps the distances up to the additive distortion parameter t, that is, the distances in H and G differ by at most t. The task of Directed Multiplicative Spanner is, given a directed graph G with m arcs and positive integers t and k, decide whether G has a multiplicative t-spanner with at most $$m-k$$ m - k arcs. Similarly, Directed Additive Spanner asks whether G has an additive t-spanner with at most $$m-k$$ m - k arcs. We show that (i) Directed Multiplicative Spanner admits a polynomial kernel of size $$\mathcal {O}(k^4t^5)$$ O ( k 4 t 5 ) and can be solved in randomized $$(4t)^k\cdot n^{\mathcal {O}(1)}$$ ( 4 t ) k · n O ( 1 ) time, (ii) the weighted variant of Directed Multiplicative Spanner can be solved in $$k^{2k}\cdot n^{\mathcal {O}(1)}$$ k 2 k · n O ( 1 ) time on directed acyclic graphs, (iii) Directed Additive Spanner is $${{\,\mathrm{\mathsf{W}}\,}}[1]$$ W [ 1 ] -hard when parameterized by k for every fixed $$t\ge 1$$ t ≥ 1 even when the input graphs are restricted to be directed acyclic graphs. The latter claim contrasts with the recent result of Kobayashi from STACS 2020 that the problem for undirected graphs is $${{\,\mathrm{\mathsf{FPT}}\,}}$$ FPT when parameterized by t and k.


2020 ◽  
Vol 29 (4) ◽  
pp. 616-632
Author(s):  
Carlos Hoppen ◽  
Yoshiharu Kohayakawa ◽  
Richard Lang ◽  
Hanno Lefmann ◽  
Henrique Stagni

AbstractThere has been substantial interest in estimating the value of a graph parameter, i.e. of a real-valued function defined on the set of finite graphs, by querying a randomly sampled substructure whose size is independent of the size of the input. Graph parameters that may be successfully estimated in this way are said to be testable or estimable, and the sample complexity qz = qz(ε) of an estimable parameter z is the size of a random sample of a graph G required to ensure that the value of z(G) may be estimated within an error of ε with probability at least 2/3. In this paper, for any fixed monotone graph property $\mathcal{P}= \text{Forb}\!(\mathcal{F}),$ we study the sample complexity of estimating a bounded graph parameter z that, for an input graph G, counts the number of spanning subgraphs of G that satisfy$\mathcal{P}$. To improve upon previous upper bounds on the sample complexity, we show that the vertex set of any graph that satisfies a monotone property $\mathcal{P}$ may be partitioned equitably into a constant number of classes in such a way that the cluster graph induced by the partition is not far from satisfying a natural weighted graph generalization of $\mathcal{P}$. Properties for which this holds are said to be recoverable, and the study of recoverable properties may be of independent interest.


2019 ◽  
Vol 53 (2) ◽  
pp. 559-576 ◽  
Author(s):  
Pascal Schroeder ◽  
Imed Kacem ◽  
Günter Schmidt

In this work we investigate the portfolio selection problem (P1) and bi-directional trading (P2) when prices are interrelated. Zhang et al. (J. Comb. Optim. 23 (2012) 159–166) provided the algorithm UND which solves one variant of P2. We are interested in solutions which are optimal from a worst-case perspective. For P1, we prove the worst-case input sequence and derive the algorithm optimal portfolio for interrelated prices (OPIP). We then prove the competitive ratio and optimality. We use the idea of OPIP to solve P2 and derive the algorithm called optimal conversion for interrelated prices (OCIP). Using OCIP, we also design optimal online algorithms for bi-directional search (P3) called bi-directional UND (BUND) and optimal online search for unknown relative price bounds (RUN). We run numerical experiments and conclude that OPIP and OCIP perform well compared to other algorithms even if prices do not behave adverse.


2019 ◽  
Vol 35 (19) ◽  
pp. 3599-3607 ◽  
Author(s):  
Mikko Rautiainen ◽  
Veli Mäkinen ◽  
Tobias Marschall

Abstract Motivation Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction and variant calling with respect to a variation graph. Results We generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of up to w over naive algorithms. For a graph with |V| nodes and |E| edges and a sequence of length m, our bitvector-based graph alignment algorithm reaches a worst case runtime of O(|V|+⌈mw⌉|E| log w) for acyclic graphs and O(|V|+m|E| log w) for arbitrary cyclic graphs. We apply it to five different types of graphs and observe a speedup between 3-fold and 20-fold compared with a previous (asymptotically optimal) alignment algorithm. Availability and implementation https://github.com/maickrau/GraphAligner Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 47 (3) ◽  
pp. 357-371 ◽  
Author(s):  
Seyed Mahmoud Sheikholeslami ◽  
Asghar Bodaghli ◽  
Lutz Volkmann

Let $D$ be a finite simple digraph with vertex set $V(D)$ and arc set $A(D)$. A twin signed Roman dominating function (TSRDF) on the digraph $D$ is a function $f:V(D)\rightarrow\{-1,1,2\}$ satisfying the conditions that (i) $\sum_{x\in N^-[v]}f(x)\ge 1$ and $\sum_{x\in N^+[v]}f(x)\ge 1$ for each $v\in V(D)$, where $N^-[v]$ (resp. $N^+[v]$) consists of $v$ and all in-neighbors (resp. out-neighbors) of $v$, and (ii) every vertex $u$ for which $f(u)=-1$ has an in-neighbor $v$ and an out-neighbor $w$ for which $f(v)=f(w)=2$. The weight of an TSRDF $f$ is $\omega(f)=\sum_{v\in V(D)}f(v)$. The twin signed Roman domination number $\gamma_{sR}^*(D)$ of $D$ is the minimum weight of an TSRDF on $D$. In this paper, we initiate the study of twin signed Roman domination in digraphs and we present some sharp bounds on $\gamma_{sR}^*(D)$. In addition, we determine the twin signed Roman domination number of some classes of digraphs.


Author(s):  
Nikola Beneš ◽  
Luboš Brim ◽  
Samuel Pastva ◽  
David Šafránek

AbstractProblems arising in many scientific disciplines are often modelled using edge-coloured directed graphs. These can be enormous in the number of both vertices and colours. Given such a graph, the original problem frequently translates to the detection of the graph’s strongly connected components, which is challenging at this scale.We propose a new, symbolic algorithm that computes all the monochromatic strongly connected components of an edge-coloured graph. In the worst case, the algorithm performs $$O(p\cdot n\cdot \log n)$$ O ( p · n · log n ) symbolic steps, where p is the number of colours and n the number of vertices. We evaluate the algorithm using an experimental implementation based on Binary Decision Diagrams (BDDs) and large (up to $$2^{48}$$ 2 48 ) coloured graphs produced by models appearing in systems biology.


Author(s):  
Yves Marcoux ◽  
Michael Sperberg-McQueen ◽  
Claus Huitfeldt

The problem of overlapping structures has long been familiar to the structured document community. In a poem, for example, the verse and line structures overlap, and having them both available simultaneously is convenient, and sometimes necessary (for example for automatic analyses). However, only structures that embed nicely can be represented directly in XML. Proposals to address this problem include XML solutions (based essentially on a layer of semantics) and non-XML ones. Among the latter is TexMecs HS2003, a markup language that allows overlap (and many other features). XML documents, when viewed as graphs, correspond to trees. Marcoux M2008 characterized overlap-only TexMecs documents by showing that they correspond exactly to completion-acyclic node-ordered directed acyclic graphs. In this paper, we elaborate on that result in two ways. First, we cast it in the setting of a strictly larger class of graphs, child-arc-ordered directed graphs, that includes multi-graphs and non-acyclic graphs, and show that — somewhat surprisingly — it does not hold in general for graphs with multiple roots. Second, we formulate a stronger condition, full-completion-acyclicity, that guarantees correspondence with an overlap-only document, even for graphs that have multiple roots. The definition of fully-completion-acyclic graph does not in itself suggest an efficient algorithm for checking the condition, nor for computing a corresponding overlap-only document when the condition is satisfied. We present basic polynomial-time upper bounds on the complexity of accomplishing those tasks.


Author(s):  
Ulf Grenander ◽  
Michael I. Miller

Probabilistic structures on the representations allow for expressing the variation of natural patterns. In this chapter the structure imposed through probabilistic directed graphs is studied. The essential probabilistic structure enforced through the directedness of the graphs is sites are conditionally independent of their nondescendants given their parents. The entropies and combinatorics of these processes are examined as well. Focus is given to the classical Markov chain and the branching process examples to illustrate the fundamentals of variability descriptions through probability and entropy.


2002 ◽  
Vol 13 (06) ◽  
pp. 889-910 ◽  
Author(s):  
MICHALIS FALOUTSOS ◽  
RAJESH PANKAJ ◽  
KENNESTH C. SEVCIK

In this paper, we study the problem of multicast routing on directed graphs. We define the asymmetry of a graph to be the maximum ratio of weights on opposite directed edges between a pair of nodes for all node-pairs. We examine three types of problems according the membership behavior: (i) the static, (ii) the join-only, (iii) the join-leave problems. We study the effect of the asymmetry on the worst case performance of two algorithms: the Greedy and Shortest Paths algorithms. The worst case performance of Shortest Paths is poor, but it is affected by neither the asymmetry nor the membership behavior. In contrast, the worst case performance of Greedy is a proportional to the asymmetry in a some cases. We prove an interesting result for the join-only problem: the Greedy algorithm has near-optimal on-line performance.


Sign in / Sign up

Export Citation Format

Share Document