Superbubbles as an empirical characteristic of directed networks

Network Science ◽

10.1017/nws.2020.32 ◽

2020 ◽

pp. 1-10

Author(s):

Fabian Gärtner ◽

Felix Kühnl ◽

Carsten R. Seemann ◽

Christian Höner Zu Siederdissen ◽

Peter F. Stadler ◽

...

Keyword(s):

Computational Biology ◽

Graphical Models ◽

Genome Assembly ◽

Real World ◽

Linear Time ◽

Directed Networks ◽

Induced Subgraphs ◽

Convenient Means ◽

Graph Parameters

Abstract Superbubbles are acyclic induced subgraphs of a digraph with single entrance and exit that naturally arise in the context of genome assembly and the analysis of genome alignments in computational biology. These structures can be computed in linear time and are confined to non-symmetric digraphs. We demonstrate empirically that graph parameters derived from superbubbles provide a convenient means of distinguishing different classes of real-world graphical models, while being largely unrelated to simple, commonly used parameters.

Download Full-text

Evaluating climate field reconstruction techniques using improved emulations of real-world conditions

Climate of the Past ◽

10.5194/cp-10-1-2014 ◽

2014 ◽

Vol 10 (1) ◽

pp. 1-19 ◽

Cited By ~ 50

Author(s):

J. Wang ◽

J. Emile-Geay ◽

D. Guillot ◽

J. E. Smerdon ◽

B. Rajaratnam

Keyword(s):

Graphical Models ◽

Real World ◽

General Circulation ◽

Circulation Model ◽

Internal Variability ◽

Field Reconstruction ◽

Spatial Skill ◽

Reconstruction Methods ◽

Climate Field Reconstruction ◽

Global Mean

Abstract. Pseudoproxy experiments (PPEs) have become an important framework for evaluating paleoclimate reconstruction methods. Most existing PPE studies assume constant proxy availability through time and uniform proxy quality across the pseudoproxy network. Real multiproxy networks are, however, marked by pronounced disparities in proxy quality, and a steep decline in proxy availability back in time, either of which may have large effects on reconstruction skill. A suite of PPEs constructed from a millennium-length general circulation model (GCM) simulation is thus designed to mimic these various real-world characteristics. The new pseudoproxy network is used to evaluate four climate field reconstruction (CFR) techniques: truncated total least squares embedded within the regularized EM (expectation-maximization) algorithm (RegEM-TTLS), the Mann et al. (2009) implementation of RegEM-TTLS (M09), canonical correlation analysis (CCA), and Gaussian graphical models embedded within RegEM (GraphEM). Each method's risk properties are also assessed via a 100-member noise ensemble. Contrary to expectation, it is found that reconstruction skill does not vary monotonically with proxy availability, but also is a function of the type and amplitude of climate variability (forced events vs. internal variability). The use of realistic spatiotemporal pseudoproxy characteristics also exposes large inter-method differences. Despite the comparable fidelity in reconstructing the global mean temperature, spatial skill varies considerably between CFR techniques. Both GraphEM and CCA efficiently exploit teleconnections, and produce consistent reconstructions across the ensemble. RegEM-TTLS and M09 appear advantageous for reconstructions on highly noisy data, but are subject to larger stochastic variations across different realizations of pseudoproxy noise. Results collectively highlight the importance of designing realistic pseudoproxy networks and implementing multiple noise realizations of PPEs. The results also underscore the difficulty in finding the proper bias-variance tradeoff for jointly optimizing the spatial skill of CFRs and the fidelity of the global mean reconstructions.

Download Full-text

Probabilistic Inference for Predicate Constraint Satisfaction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i02.5526 ◽

2020 ◽

Vol 34 (02) ◽

pp. 1644-1651

Author(s):

Yuki Satake ◽

Hiroshi Unno ◽

Hinata Yanagi

Keyword(s):

Graphical Models ◽

Constraint Satisfaction ◽

Linear Time ◽

Predicate Logic ◽

Probabilistic Inference ◽

Search Space ◽

Constraint Satisfaction Problems ◽

Safety Properties ◽

First Order ◽

Guided Inductive

In this paper, we present a novel constraint solving method for a class of predicate Constraint Satisfaction Problems (pCSP) where each constraint is represented by an arbitrary clause of first-order predicate logic over predicate variables. The class of pCSP properly subsumes the well-studied class of Constrained Horn Clauses (CHCs) where each constraint is restricted to a Horn clause. The class of CHCs has been widely applied to verification of linear-time safety properties of programs in different paradigms. In this paper, we show that pCSP further widens the applicability to verification of branching-time safety properties of programs that exhibit finitely-branching non-determinism. Solving pCSP (and CHCs) however is challenging because the search space of solutions is often very large (or unbounded), high-dimensional, and non-smooth. To address these challenges, our method naturally combines techniques studied separately in different literatures: counterexample guided inductive synthesis (CEGIS) and probabilistic inference in graphical models. We have implemented the presented method and obtained promising results on existing benchmarks as well as new ones that are beyond the scope of existing CHC solvers.

Download Full-text

Pangenome Graphs

Annual Review of Genomics and Human Genetics ◽

10.1146/annurev-genom-120219-080406 ◽

2020 ◽

Vol 21 (1) ◽

pp. 139-162 ◽

Cited By ~ 2

Author(s):

Jordan M. Eizenga ◽

Adam M. Novak ◽

Jonas A. Sibbesen ◽

Simon Heumos ◽

Ali Ghaffaari ◽

...

Keyword(s):

Graphical Models ◽

Genome Assembly ◽

Association Studies ◽

Low Cost ◽

Variant Calling ◽

Superior Performance ◽

Whole Genome ◽

Coordinate Systems ◽

Multiple Sequence ◽

Additional Information

Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.

Download Full-text

A Linear Time Algorithm for a Variant of the MAX CUT Problem in Series Parallel Graphs

Advances in Operations Research ◽

10.1155/2017/1267108 ◽

2017 ◽

Vol 2017 ◽

pp. 1-4 ◽

Cited By ~ 4

Author(s):

Brahim Chaourar

Keyword(s):

Linear Time ◽

Maximum Flow ◽

Time Algorithm ◽

Minimum Cut ◽

Linear Time Algorithm ◽

Induced Subgraphs ◽

Positive Weight ◽

Max Cut Problem ◽

In Series ◽

Minimum Cut Problem

Given a graph G=V,E, a connected sides cut U,V\U or δU is the set of edges of E linking all vertices of U to all vertices of V\U such that the induced subgraphs GU and GV\U are connected. Given a positive weight function w defined on E, the maximum connected sides cut problem (MAX CS CUT) is to find a connected sides cut Ω such that wΩ is maximum. MAX CS CUT is NP-hard. In this paper, we give a linear time algorithm to solve MAX CS CUT for series parallel graphs. We deduce a linear time algorithm for the minimum cut problem in the same class of graphs without computing the maximum flow.

Download Full-text

Linear-time superbubble identification algorithm for genome assembly

Theoretical Computer Science ◽

10.1016/j.tcs.2015.10.021 ◽

2016 ◽

Vol 609 ◽

pp. 374-383 ◽

Cited By ~ 12

Author(s):

Ljiljana Brankovic ◽

Costas S. Iliopoulos ◽

Ritu Kundu ◽

Manal Mohamed ◽

Solon P. Pissis ◽

...

Keyword(s):

Genome Assembly ◽

Linear Time ◽

Identification Algorithm

Download Full-text

Identifying Hierarchical Structure in Sequences: A linear-time algorithm

Journal of Artificial Intelligence Research ◽

10.1613/jair.374 ◽

1997 ◽

Vol 7 ◽

pp. 67-82 ◽

Cited By ~ 234

Author(s):

C. G. Nevill-Manning ◽

I. H. Witten

Keyword(s):

Hierarchical Structure ◽

Real World ◽

Simple Structure ◽

Linear Time ◽

Time Algorithm ◽

Linear Time Algorithm ◽

Original Sequence ◽

Lexical Structure ◽

Extensive Range ◽

Grammatical Rule

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.

Download Full-text

The A, C, G, and T of Genome Assembly

BioMed Research International ◽

10.1155/2016/6329217 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Bilal Wajid ◽

Muhammad U. Sohail ◽

Ali R. Ekti ◽

Erchin Serpedin

Keyword(s):

Computational Biology ◽

Genome Assembly ◽

Future Research ◽

Raw Data ◽

Significant Research ◽

Sequencing Platforms ◽

Key Steps

Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.

Download Full-text

On the Maximal Shortest Paths Cover Number

Mathematics ◽

10.3390/math9141592 ◽

2021 ◽

Vol 9 (14) ◽

pp. 1592

Author(s):

Iztok Peterin ◽

Gabriel Semanišin

Keyword(s):

Shortest Path ◽

Linear Time ◽

Shortest Paths ◽

Time Algorithm ◽

Linear Time Algorithm ◽

Np Hard ◽

Minimum Cardinality ◽

Graph Parameters

A shortest path P of a graph G is maximal if P is not contained as a subpath in any other shortest path. A set S⊆V(G) is a maximal shortest paths cover if every maximal shortest path of G contains a vertex of S. The minimum cardinality of a maximal shortest paths cover is called the maximal shortest paths cover number and is denoted by ξ(G). We show that it is NP-hard to determine ξ(G). We establish a connection between ξ(G) and several other graph parameters. We present a linear time algorithm that computes exact value for ξ(T) of a tree T.

Download Full-text

Data Reduction for Maximum Matching on Real-World Graphs

Journal of Experimental Algorithmics ◽

10.1145/3439801 ◽

2021 ◽

Vol 26 ◽

pp. 1-30

Author(s):

Tomohiro Koana ◽

Viatcheslav Korenwein ◽

André Nichterlein ◽

Rolf Niedermeier ◽

Philipp Zschoche

Keyword(s):

Real World ◽

Data Reduction ◽

Complexity Analysis ◽

Linear Time ◽

Theoretical Work ◽

Maximum Matching ◽

Maximum Cardinality ◽

Time Data ◽

Maximum Weight Matching ◽

Weighted Case

Finding a maximum-cardinality or maximum-weight matching in (edge-weighted) undirected graphs is among the most prominent problems of algorithmic graph theory. For n -vertex and m -edge graphs, the best-known algorithms run in Õ( m √ n ) time. We build on recent theoretical work focusing on linear-time data reduction rules for finding maximum-cardinality matchings and complement the theoretical results by presenting and analyzing (thereby employing the kernelization methodology of parameterized complexity analysis) new (near-)linear-time data reduction rules for both the unweighted and the positive-integer-weighted case. Moreover, we experimentally demonstrate that these data reduction rules provide significant speedups of the state-of-the art implementations for computing matchings in real-world graphs: the average speedup factor is 4.7 in the unweighted case and 12.72 in the weighted case.

Download Full-text