Data Reduction for Maximum Matching on Real-World Graphs

2021 ◽  
Vol 26 ◽  
pp. 1-30
Author(s):  
Tomohiro Koana ◽  
Viatcheslav Korenwein ◽  
André Nichterlein ◽  
Rolf Niedermeier ◽  
Philipp Zschoche

Finding a maximum-cardinality or maximum-weight matching in (edge-weighted) undirected graphs is among the most prominent problems of algorithmic graph theory. For n -vertex and m -edge graphs, the best-known algorithms run in Õ( m √ n ) time. We build on recent theoretical work focusing on linear-time data reduction rules for finding maximum-cardinality matchings and complement the theoretical results by presenting and analyzing (thereby employing the kernelization methodology of parameterized complexity analysis) new (near-)linear-time data reduction rules for both the unweighted and the positive-integer-weighted case. Moreover, we experimentally demonstrate that these data reduction rules provide significant speedups of the state-of-the art implementations for computing matchings in real-world graphs: the average speedup factor is 4.7 in the unweighted case and 12.72 in the weighted case.

Algorithmica ◽  
2020 ◽  
Vol 82 (12) ◽  
pp. 3521-3565
Author(s):  
George B. Mertzios ◽  
André Nichterlein ◽  
Rolf Niedermeier

Abstract Finding maximum-cardinality matchings in undirected graphs is arguably one of the most central graph primitives. For m-edge and n-vertex graphs, it is well-known to be solvable in $$O(m\sqrt{n})$$ O ( m n )  time; however, for several applications this running time is still too slow. We investigate how linear-time (and almost linear-time) data reduction (used as preprocessing) can alleviate the situation. More specifically, we focus on linear-time kernelization. We start a deeper and systematic study both for general graphs and for bipartite graphs. Our data reduction algorithms easily comply (in form of preprocessing) with every solution strategy (exact, approximate, heuristic), thus making them attractive in various settings.


Author(s):  
Atheer Alahmed ◽  
Amal Alrasheedi ◽  
Maha Alharbi ◽  
Norah Alrebdi ◽  
Marwan Aleasa ◽  
...  

2019 ◽  
Author(s):  
Jaclyn Marjorie Smith ◽  
Melvin Lathara ◽  
Hollis Wright ◽  
Brian Hill ◽  
Nalini Ganapati ◽  
...  

Abstract Background The affordability of next-generation genomic sequencing and the improvement of medical data management have contributed largely to the evolution of biological analysis from both a clinical and research perspective. Precision medicine is a response to these advancements that places individuals into better-defined subsets based on shared clinical and genetic features. The identification of personalized diagnosis and treatment options is dependent on the ability to draw insights from large-scale, multi-modal analysis of biomedical datasets. Driven by a real use case, we premise that platforms that support precision medicine analysis should maintain data in their optimal data stores, should support distributed storage and query mechanisms, and should scale as more samples are added to the system. Results We extended a genomics-based columnar data store, GenomicsDB, for ease of use within a distributed analytics platform for clinical and genomic data integration, known as the ODA framework. The framework supports interaction from an i2b2 plugin as well as a notebook environment. We show that the ODA framework exhibits worst-case linear scaling for array size (storage), import time (data construction), and query time for an increasing number of samples. We go on to show worst-case linear time for both import of clinical data and aggregate query execution time within a distributed environment. Conclusions This work highlights the integration of a distributed genomic database with a distributed compute environment to support scalable and efficient precision medicine queries from a HIPAA-compliant, cohort system in a real-world setting. The ODA framework is currently deployed in production to support precision medicine exploration and analysis from clinicians and researchers at UCLA David Geffen School of Medicine.


2014 ◽  
Vol 61 (1) ◽  
pp. 1-23 ◽  
Author(s):  
Ran Duan ◽  
Seth Pettie

Author(s):  
Marwa F. Mohamed ◽  
Abd El-Rahman Shabayek ◽  
Mahmoud El-Gayyar ◽  
Hamed Nassar

2020 ◽  
pp. 1-10
Author(s):  
Fabian Gärtner ◽  
Felix Kühnl ◽  
Carsten R. Seemann ◽  
Christian Höner Zu Siederdissen ◽  
Peter F. Stadler ◽  
...  

Abstract Superbubbles are acyclic induced subgraphs of a digraph with single entrance and exit that naturally arise in the context of genome assembly and the analysis of genome alignments in computational biology. These structures can be computed in linear time and are confined to non-symmetric digraphs. We demonstrate empirically that graph parameters derived from superbubbles provide a convenient means of distinguishing different classes of real-world graphical models, while being largely unrelated to simple, commonly used parameters.


1997 ◽  
Vol 7 ◽  
pp. 67-82 ◽  
Author(s):  
C. G. Nevill-Manning ◽  
I. H. Witten

SEQUITUR is an algorithm that infers a hierarchical structure from a sequence of discrete symbols by replacing repeated phrases with a grammatical rule that generates the phrase, and continuing this process recursively. The result is a hierarchical representation of the original sequence, which offers insights into its lexical structure. The algorithm is driven by two constraints that reduce the size of the grammar, and produce structure as a by-product. SEQUITUR breaks new ground by operating incrementally. Moreover, the method's simple structure permits a proof that it operates in space and time that is linear in the size of the input. Our implementation can process 50,000 symbols per second and has been applied to an extensive range of real world sequences.


Sign in / Sign up

Export Citation Format

Share Document