Upper bounds for the expected length of a longest common subsequence of two binary sequences

Summary Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n ≦ 5, and f(n,2) where n ≦ 10. We study the limiting behaviour of n –1 f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2).

Download Full-text

Longest common subsequences of two random sequences

Journal of Applied Probability ◽

10.2307/3212444 ◽

1975 ◽

Vol 12 (2) ◽

pp. 306-315 ◽

Cited By ~ 96

Author(s):

Vacláv Chvátal ◽

David Sankoff

Keyword(s):

Monte Carlo ◽

Molecular Evolution ◽

Lower Bounds ◽

Monte Carlo Methods ◽

Longest Common Subsequence ◽

Upper And Lower Bounds ◽

Longest Common Subsequences ◽

Expected Length ◽

Common Subsequence ◽

Limiting Behaviour

SummaryGiven two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n ≦ 5, and f(n,2) where n ≦ 10. We study the limiting behaviour of n–1f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2).

Download Full-text

On a Speculated Relation Between Chvátal–Sankoff Constants of Several Sequences

Combinatorics Probability Computing ◽

10.1017/s0963548309009900 ◽

2009 ◽

Vol 18 (4) ◽

pp. 517-532

Author(s):

M. KIWI ◽

J. SOTO

Keyword(s):

Lower Bounds ◽

Longest Common Subsequence ◽

Expected Length ◽

Common Subsequence

It is well known that, when normalized byn, the expected length of a longest common subsequence ofdsequences of lengthnover an alphabet of size σ converges to a constant γσ,d. We disprove a speculation by Steele regarding a possible relation between γ2,dand γ2,2. In order to do that we also obtain some new lower bounds for γσ,d, when both σ anddare small integers.

Download Full-text

Large deviations-based upper bounds on the expected relative length of longest common subsequences

Advances in Applied Probability ◽

10.1017/s0001867800001294 ◽

2006 ◽

Vol 38 (03) ◽

pp. 827-852 ◽

Cited By ~ 1

Author(s):

Raphael Hauser ◽

Servet Martínez ◽

Heinrich Matzinger

Keyword(s):

High Precision ◽

Upper Bound ◽

Large Deviation ◽

Relative Length ◽

Random Variable ◽

Upper Bounds ◽

Longest Common Subsequence ◽

Upper And Lower Bounds ◽

Finite Alphabet ◽

Common Subsequence

Consider the random variable L n defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit γ=lim n→∞E[L n ]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (q m ) m∈ℕ, on γ, and how Monte Carlo simulation can be used in theory to compute estimates, q̂ m , of the q m such that, for given Ξ > 0 and Λ ∈ (0,1), we have P[γ < q̂ < γ + Ξ] ≥ Λ. In other words, with high probability the result is an upper bound that approximates γ to high precision. We establish O((1 − Λ)−1Ξ−(4+ε)) as a theoretical upper bound on the complexity of computing q̂ m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.

Download Full-text

Common Subsequences and Supersequences and their Expected Length

Combinatorics Probability Computing ◽

10.1017/s096354839800368x ◽

1998 ◽

Vol 7 (4) ◽

pp. 365-373 ◽

Cited By ~ 4

Author(s):

VLADO DANČÍK

Keyword(s):

Longest Common Subsequence ◽

Expected Length ◽

Common Subsequence ◽

Shortest Common Supersequence

Let f(n, k, l) be the expected length of a longest common subsequence of l sequences of length n over an alphabet of size k. It is known that there are constants γ(l)k such that f(n, k, l)→ γ(l)kn as n→∞, and we show that γ(l)k= Θ(k1/l−1) as k→∞. Bounds for the corresponding constants for the expected length of a shortest common supersequence are also presented.

Download Full-text

A Beam Search for the Longest Common Subsequence Problem Guided by a Novel Approximate Expected Length Calculation

Machine Learning, Optimization, and Data Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-37599-7_14 ◽

2019 ◽

pp. 154-167

Author(s):

Marko Djukanovic ◽

Günther R. Raidl ◽

Christian Blum

Keyword(s):

Longest Common Subsequence ◽

Beam Search ◽

Longest Common Subsequence Problem ◽

Expected Length ◽

Common Subsequence

Download Full-text

Expected length of the longest common subsequence for large alphabets

Advances in Mathematics ◽

10.1016/j.aim.2004.10.012 ◽

2005 ◽

Vol 197 (2) ◽

pp. 480-498 ◽

Cited By ~ 22

Author(s):

Marcos Kiwi ◽

Martin Loebl ◽

Jiří Matoušek

Keyword(s):

Longest Common Subsequence ◽

Expected Length ◽

Common Subsequence

Download Full-text

Large deviations-based upper bounds on the expected relative length of longest common subsequences

Advances in Applied Probability ◽

10.1239/aap/1158685004 ◽

2006 ◽

Vol 38 (3) ◽

pp. 827-852 ◽

Cited By ~ 5

Author(s):

Raphael Hauser ◽

Servet Martínez ◽

Heinrich Matzinger

Keyword(s):

High Precision ◽

Upper Bound ◽

Large Deviation ◽

Relative Length ◽

Random Variable ◽

Upper Bounds ◽

Longest Common Subsequence ◽

Upper And Lower Bounds ◽

Finite Alphabet ◽

Common Subsequence

Consider the random variable Ln defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit γ=limn→∞E[Ln]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (qm)m∈ℕ, on γ, and how Monte Carlo simulation can be used in theory to compute estimates, q̂m, of the qm such that, for given Ξ > 0 and Λ ∈ (0,1), we have P[γ < q̂ < γ + Ξ] ≥ Λ. In other words, with high probability the result is an upper bound that approximates γ to high precision. We establish O((1 − Λ)−1Ξ−(4+ε)) as a theoretical upper bound on the complexity of computing q̂m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.

Download Full-text