scholarly journals Minimum Common String Partition Problem: Hardness and Approximations

10.37236/1947 ◽  
2005 ◽  
Vol 12 (1) ◽  
Author(s):  
Avraham Goldstein ◽  
Petr Kolman ◽  
Jie Zheng

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing and compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement. A partition of a string $A$ is a sequence ${\cal P} = (P_1,P_2,\dots,P_m)$ of strings, called the blocks, whose concatenation is equal to $A$. Given a partition ${\cal P}$ of a string $A$ and a partition ${\cal Q}$ of a string $B$, we say that the pair $\langle{{\cal P},{\cal Q}}\rangle$ is a common partition of $A$ and $B$ if ${\cal Q}$ is a permutation of ${\cal P}$. The minimum common string partition problem (MCSP) is to find a common partition of two strings $A$ and $B$ with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most $k$ times in each input string, is denoted by $k$-MCSP. In this paper, we show that $2$-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a $1.1037$-approximation for $2$-MCSP and a linear time $4$-approximation algorithm for $3$-MCSP. We are not aware of any better approximations.

10.37236/968 ◽  
2007 ◽  
Vol 14 (1) ◽  
Author(s):  
Petr Kolman ◽  
Tomasz Waleń

In the last decade there has been an ongoing interest in string comparison problems; to a large extend the interest was stimulated by genome rearrangement problems in computational biology but related problems appear in many other areas of computer science. Particular attention has been given to the problem of sorting by reversals (SBR): given two strings, $A$ and $B$, find the minimum number of reversals that transform the string $A$ into the string $B$ (a reversal $\rho(i,j)$, $i < j$, transforms a string $A=a_1\ldots a_n$ into a string $A'=a_1\ldots a_{i-1} a_{j} a_{j-1} \ldots a_{i} a_{j+1} \ldots a_n$). Closely related is the minimum common string partition problem (MCSP): given two strings, $A$ and $B$, find a minimum size partition of $A$ into substrings $P_1,\ldots,P_l$ (i.e., $A=P_1\ldots P_l$) and a partition of $B$ into substrings $Q_1,\ldots,Q_l$ such that $(Q_1,\ldots,Q_l)$ is a permutation of $(P_1,\ldots,P_l)$. Primarily the SBR problem has been studied for strings in which every symbol appears exactly once (that is, for permutations) and only recently attention has been given to the general case where duplicates of the symbols are allowed. In this paper we consider the problem $k$-SBR, a version of SBR in which each symbol is allowed to appear up to $k$ times in each string, for some $k\geq 1$. The main result of the paper is a $\Theta(k)$-approximation algorithm for $k$-SBR running in time $O(n)$; compared to the previously known algorithm for $k$-SBR, this is an improvement by a factor of $\Theta(k)$ in the approximation ratio, and by a factor of $\Theta(k)$ in the running time. We approach the $k$-SBR by finding an approximation for the $k$-MCSP first and then turning it into a solution for $k$-SBR. Crucial ingredients of our algorithm are the suffix tree data structure and a linear time algorithm for a special case of a disjoint set union problem.


2019 ◽  
Author(s):  
Md. Khaledur Rahman ◽  
M. Sohel Rahman

AbstractThe genome rearrangement problem computes the minimum number of operations that are required to sort all elements of a permutation. A block-interchange operation exchanges two blocks of a permutation which are not necessarily adjacent and in a prefix block-interchange, one block is always the prefix of that permutation. In this paper, we focus on applying prefix block-interchanges on binary and ternary strings. We present upper bounds to group and sort a given binary/ternary string. We also provide upper bounds for a different version of the block-interchange operation which we refer to as the ‘restricted prefix block-interchange’. We observe that our obtained upper bound for restricted prefix block-interchange operations on binary strings is better than that of other genome rearrangement operations to group fully normalized binary strings. Consequently, we provide a linear-time algorithm to solve the problem of grouping binary normalized strings by restricted prefix block-interchanges. We also provide a polynomial time algorithm to group normalized ternary strings by prefix block-interchange operations. Finally, we provide a classification for ternary strings based on the required number of prefix block-interchange operations.


2020 ◽  
Author(s):  
Gustavo Rodrigues Galvão ◽  
Zanoni Dias

The problem of finding the minimum sequence of rearrangements that transforms one genome into another is a well-studied problem that finds application in comparative genomics. Representing genomes as permutations, in which genes appear as elements, that problem can be reduced to the combinatorial problem of sorting a permutation using a minimum number of rearrangements. Such combinatorial problem varies according to the types of rearrangements considered. The PhD thesis summarized in this paper presents exact, approximation, and heuristic algorithms for solving variants of the permutation sorting problem involving two types of rearrangements: reversals and transpositions.


VLSI Design ◽  
2002 ◽  
Vol 15 (2) ◽  
pp. 485-489
Author(s):  
Youssef Saab

Partitioning is a fundamental problem in the design of VLSI circuits. In recent years, ratio-cut partitioning has received attention due to its tendency to partition circuits into their natural clusters. Node contraction has also been shown to enhance the performance of iterative partitioning algorithms. This paper describes a new simple ratio-cut partitioning algorithm using node contraction. This new algorithm combines iterative improvement with progressive cluster formation. Under suitably mild assumptions, the new algorithm runs in linear time. It is also shown that the new algorithm compares favorably with previous approaches.


2019 ◽  
Author(s):  
Momoko Hayamizu ◽  
Kazuhisa Makino

Abstract 'Tree-based' phylogenetic networks provide a mathematically-tractable model for representing reticulate evolution in biology. Such networks consist of an underlying 'support tree' together with arcs between the edges of this tree. However, a tree-based network can have several such support trees, and this leads to a variety of algorithmic problems that are relevant to the analysis of biological data. Recently, Hayamizu (arXiv:1811.05849 [math.CO]) proved a structure theorem for tree-based phylogenetic networks and obtained linear-time and linear-delay algorithms for many basic problems on support trees, such as counting, optimisation, and enumeration. In the present paper, we consider the following fundamental problem in statistical data analysis: given a tree-based phylogenetic network $N$ whose arcs are associated with probability, create the top-$k$ support tree ranking for $N$ by their likelihood values. We provide a linear-delay (and hence optimal) algorithm for the problem and thus reveal the interesting property of tree-based phylogenetic networks that ranking top-$k$ support trees is as computationally easy as picking $k$ arbitrary support trees.


2017 ◽  
Vol 27 (03) ◽  
pp. 159-176
Author(s):  
Helmut Alt ◽  
Sergio Cabello ◽  
Panos Giannopoulos ◽  
Christian Knauer

We study the complexity of the following cell connection problems in segment arrangements. Given a set of straight-line segments in the plane and two points [Formula: see text] and [Formula: see text] in different cells of the induced arrangement: [(i)] compute the minimum number of segments one needs to remove so that there is a path connecting [Formula: see text] to [Formula: see text] that does not intersect any of the remaining segments; [(ii)] compute the minimum number of segments one needs to remove so that the arrangement induced by the remaining segments has a single cell. We show that problems (i) and (ii) are NP-hard and discuss some special, tractable cases. Most notably, we provide a near-linear-time algorithm for a variant of problem (i) where the path connecting [Formula: see text] to [Formula: see text] must stay inside a given polygon [Formula: see text] with a constant number of holes, the segments are contained in [Formula: see text], and the endpoints of the segments are on the boundary of [Formula: see text]. The approach for this latter result uses homotopy of paths to group the segments into clusters with the property that either all segments in a cluster or none participate in an optimal solution.


Web Mining ◽  
2011 ◽  
pp. 322-338 ◽  
Author(s):  
Zhixiang Chen ◽  
Richard H. Fowler ◽  
Ada Wai-Chee Fu ◽  
Chunyue Wang

A maximal forward reference of a Web user is a longest consecutive sequence of Web pages visited by the user in a session without revisiting some previously visited page in the sequence. Efficient mining of frequent traversal path patterns, that is, large reference sequences of maximal forward references, from very large Web logs is a fundamental problem in Web mining. This chapter aims at designing algorithms for this problem with the best possible efficiency. First, two optimal linear time algorithms are designed for finding maximal forward references from Web logs. Second, two algorithms for mining frequent traversal path patterns are devised with the help of a fast construction of shallow generalized suffix trees over a very large alphabet. These two algorithms have respectively provable linear and sublinear time complexity, and their performances are analyzed in comparison with the a priori-like algorithms and the Ukkonen algorithm. It is shown that these two new algorithms are substantially more efficient than the a priori-like algorithms and the Ukkonen algorithm.


Sign in / Sign up

Export Citation Format

Share Document