Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement

J. Kececioglu; D. Sankoff

doi:10.1007/bf01188586

A general heuristic for genome rearrangement problems

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720014500127 ◽

2014 ◽

Vol 12 (03) ◽

pp. 1450012 ◽

Cited By ~ 4

Author(s):

Ulisses Dias ◽

Gustavo Rodrigues Galvão ◽

Carla Négri Lintzmayer ◽

Zanoni Dias

Keyword(s):

Approximation Algorithms ◽

Genome Rearrangement ◽

Optimal Algorithm ◽

Approximation Ratio ◽

Practical Implementation ◽

Source Codes ◽

Sorting By Reversals ◽

New Approaches ◽

Classic Problem ◽

Special Case

In this paper, we present a general heuristic for several problems in the genome rearrangement field. Our heuristic does not solve any problem directly, it is rather used to improve the solutions provided by any non-optimal algorithm that solve them. Therefore, we have implemented several algorithms described in the literature and several algorithms developed by ourselves. As a whole, we implemented 23 algorithms for 9 well known problems in the genome rearrangement field. A total of 13 algorithms were implemented for problems that use the notions of prefix and suffix operations. In addition, we worked on 5 algorithms for the classic problem of sorting by transposition and we conclude the experiments by presenting results for 3 approximation algorithms for the sorting by reversals and transpositions problem and 2 approximation algorithms for the sorting by reversals problem. Another algorithm with better approximation ratio can be found for the last genome rearrangement problem, but it is purely theoretical with no practical implementation. The algorithms we implemented in addition to our heuristic lead to the best practical results in each case. In particular, we were able to improve results on the sorting by transpositions problem, which is a very special case because many efforts have been made to generate algorithms with good results in practice and some of these algorithms provide results that equal the optimum solutions in many cases. Our source codes and benchmarks are freely available upon request from the authors so that it will be easier to compare new approaches against our results.

Download Full-text

Minimum Common String Partition Problem: Hardness and Approximations

The Electronic Journal of Combinatorics ◽

10.37236/1947 ◽

2005 ◽

Vol 12 (1) ◽

Cited By ~ 12

Author(s):

Avraham Goldstein ◽

Petr Kolman ◽

Jie Zheng

Keyword(s):

Genome Rearrangement ◽

Linear Time ◽

Fundamental Problem ◽

Text Processing ◽

Partition Problem ◽

Sorting By Reversals ◽

String Comparison ◽

Minimum Number ◽

Tight Connection ◽

Minimum Common String Partition

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement. A partition of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the blocks, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a common partition of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The minimum common string partition problem (MCSP) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most $k$ times in each input string, is denoted by $k$-MCSP. In this paper, we show that $2$-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-MCSP and a linear time $4$-approximation algorithm for $3$-MCSP. We are not aware of any better approximations.

Download Full-text

Algorithms for sorting by reversals or transpositions, with application to genome rearrangement

10.47749/t/unicamp.2015.959756 ◽

2015 ◽

Author(s):

Gustavo Rodrigues Galvão

Keyword(s):

Genome Rearrangement ◽

Sorting By Reversals

Download Full-text

Exact and Approximation Algorithms for Computing Reversal Distances in Genome Rearrangement

10.31979/etd.qm9e-d3gt ◽

2008 ◽

Author(s):

Euna Park

Keyword(s):

Approximation Algorithms ◽

Genome Rearrangement

Download Full-text

Super short operations on both gene order and intergenic sizes

Algorithms for Molecular Biology ◽

10.1186/s13015-019-0156-5 ◽

2019 ◽

Vol 14 (1) ◽

Cited By ~ 1

Author(s):

Andre R. Oliveira ◽

Géraldine Jean ◽

Guillaume Fertin ◽

Ulisses Dias ◽

Zanoni Dias

Keyword(s):

Approximation Algorithms ◽

Gene Order ◽

Genome Rearrangement ◽

Unit Cost ◽

Genome Rearrangements ◽

Minimum Length ◽

Approximation Factor ◽

A Genome ◽

Number Of Genes ◽

Intergenic Regions

Abstract Background The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. Results and conclusions In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.

Download Full-text

Algorithms for Sorting by Reversals or Transpositions, with Application to Genome Rearrangement

10.5753/ctd.2016.9145 ◽

2020 ◽

Author(s):

Gustavo Rodrigues Galvão ◽

Zanoni Dias

Keyword(s):

Comparative Genomics ◽

Genome Rearrangement ◽

Heuristic Algorithms ◽

Combinatorial Problem ◽

Sorting By Reversals ◽

Sorting Problem ◽

Minimum Number ◽

Phd Thesis

The problem of finding the minimum sequence of rearrangements that transforms one genome into another is a well-studied problem that finds application in comparative genomics. Representing genomes as permutations, in which genes appear as elements, that problem can be reduced to the combinatorial problem of sorting a permutation using a minimum number of rearrangements. Such combinatorial problem varies according to the types of rearrangements considered. The PhD thesis summarized in this paper presents exact, approximation, and heuristic algorithms for solving variants of the permutation sorting problem involving two types of rearrangements: reversals and transpositions.

Download Full-text

NN-approach to design of the optimal stochastic approximation algorithms

International Conference on Control '94 ◽

10.1049/cp:19940173 ◽

1994 ◽

Cited By ~ 1

Author(s):

A.V. Nazin

Keyword(s):

Approximation Algorithms ◽

Stochastic Approximation

Download Full-text

Fast distributed approximation algorithms for vertex cover and set cover in anonymous networks

Proceedings of the 22nd ACM symposium on Parallelism in algorithms and architectures - SPAA '10 ◽

10.1145/1810479.1810533 ◽

2010 ◽

Cited By ~ 17

Author(s):

Matti Åstrand ◽

Jukka Suomela

Keyword(s):

Approximation Algorithms ◽

Vertex Cover ◽

Set Cover ◽

Anonymous Networks ◽

Distributed Approximation

Download Full-text

Approximation Algorithms for Submodular Data Summarization with a Knapsack Constraint

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3447383 ◽

2021 ◽

Vol 5 (1) ◽

pp. 1-31

Author(s):

Kai Han ◽

Shuang Cui ◽

Tianshuai Zhu ◽

Enpei Zhang ◽

Benwei Wu ◽

...

Keyword(s):

Approximation Algorithms ◽

Fundamental Problem ◽

Randomized Algorithm ◽

Deterministic Algorithm ◽

Submodular Function ◽

Approximation Ratio ◽

Performance Bounds ◽

Data Summarization ◽

Submodular Optimization ◽

Knapsack Constraint

Data summarization, i.e., selecting representative subsets of manageable size out of massive data, is often modeled as a submodular optimization problem. Although there exist extensive algorithms for submodular optimization, many of them incur large computational overheads and hence are not suitable for mining big data. In this work, we consider the fundamental problem of (non-monotone) submodular function maximization with a knapsack constraint, and propose simple yet effective and efficient algorithms for it. Specifically, we propose a deterministic algorithm with approximation ratio 6 and a randomized algorithm with approximation ratio 4, and show that both of them can be accelerated to achieve nearly linear running time at the cost of weakening the approximation ratio by an additive factor of ε. We then consider a more restrictive setting without full access to the whole dataset, and propose streaming algorithms with approximation ratios of 8+ε and 6+ε that make one pass and two passes over the data stream, respectively. As a by-product, we also propose a two-pass streaming algorithm with an approximation ratio of 2+ε when the considered submodular function is monotone. To the best of our knowledge, our algorithms achieve the best performance bounds compared to the state-of-the-art approximation algorithms with efficient implementation for the same problem. Finally, we evaluate our algorithms in two concrete submodular data summarization applications for revenue maximization in social networks and image summarization, and the empirical results show that our algorithms outperform the existing ones in terms of both effectiveness and efficiency.

Download Full-text

Approximation Algorithms for Maximin Fair Division

ACM Transactions on Economics and Computation ◽

10.1145/3381525 ◽

2020 ◽

Vol 8 (1) ◽

pp. 1-28

Author(s):

Siddharth Barman ◽

Sanath Kumar Krishnamurthy

Keyword(s):

Approximation Algorithms ◽

Fair Division

Download Full-text