Upper bounds for the expected length of a longest common subsequence of two binary sequences

1995 ◽  
Vol 6 (4) ◽  
pp. 449-458 ◽  
Author(s):  
Vlado Dančík ◽  
Mike Paterson
1975 ◽  
Vol 12 (02) ◽  
pp. 306-315 ◽  
Author(s):  
Vacláv Chvátal ◽  
David Sankoff

Summary Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n ≦ 5, and f(n,2) where n ≦ 10. We study the limiting behaviour of n –1 f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2).


1975 ◽  
Vol 12 (2) ◽  
pp. 306-315 ◽  
Author(s):  
Vacláv Chvátal ◽  
David Sankoff

SummaryGiven two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n ≦ 5, and f(n,2) where n ≦ 10. We study the limiting behaviour of n–1f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2).


2009 ◽  
Vol 18 (4) ◽  
pp. 517-532
Author(s):  
M. KIWI ◽  
J. SOTO

It is well known that, when normalized byn, the expected length of a longest common subsequence ofdsequences of lengthnover an alphabet of size σ converges to a constant γσ,d. We disprove a speculation by Steele regarding a possible relation between γ2,dand γ2,2. In order to do that we also obtain some new lower bounds for γσ,d, when both σ anddare small integers.


2006 ◽  
Vol 38 (03) ◽  
pp. 827-852 ◽  
Author(s):  
Raphael Hauser ◽  
Servet Martínez ◽  
Heinrich Matzinger

Consider the random variable L n defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit γ=lim n→∞E[L n ]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (q m ) m∈ℕ, on γ, and how Monte Carlo simulation can be used in theory to compute estimates, q̂ m , of the q m such that, for given Ξ > 0 and Λ ∈ (0,1), we have P[γ < q̂ < γ + Ξ] ≥ Λ. In other words, with high probability the result is an upper bound that approximates γ to high precision. We establish O((1 − Λ)−1Ξ−(4+ε)) as a theoretical upper bound on the complexity of computing q̂ m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.


1998 ◽  
Vol 7 (4) ◽  
pp. 365-373 ◽  
Author(s):  
VLADO DANČÍK

Let f(n, k, l) be the expected length of a longest common subsequence of l sequences of length n over an alphabet of size k. It is known that there are constants γ(l)k such that f(n, k, l)→ γ(l)kn as n→∞, and we show that γ(l)k= Θ(k1/l−1) as k→∞. Bounds for the corresponding constants for the expected length of a shortest common supersequence are also presented.


2005 ◽  
Vol 197 (2) ◽  
pp. 480-498 ◽  
Author(s):  
Marcos Kiwi ◽  
Martin Loebl ◽  
Jiří Matoušek

2006 ◽  
Vol 38 (3) ◽  
pp. 827-852 ◽  
Author(s):  
Raphael Hauser ◽  
Servet Martínez ◽  
Heinrich Matzinger

Consider the random variable Ln defined as the length of a longest common subsequence of two random strings of length n and whose random characters are independent and identically distributed over a finite alphabet. Chvátal and Sankoff showed that the limit γ=limn→∞E[Ln]/n is well defined. The exact value of this constant is not known, but various methods for the computation of upper and lower bounds have been discussed in the literature. Even so, high-precision bounds are hard to come by. In this paper we discuss how large deviation theory can be used to derive a consistent sequence of upper bounds, (qm)m∈ℕ, on γ, and how Monte Carlo simulation can be used in theory to compute estimates, q̂m, of the qm such that, for given Ξ > 0 and Λ ∈ (0,1), we have P[γ < q̂ < γ + Ξ] ≥ Λ. In other words, with high probability the result is an upper bound that approximates γ to high precision. We establish O((1 − Λ)−1Ξ−(4+ε)) as a theoretical upper bound on the complexity of computing q̂m to the given level of accuracy and confidence. Finally, we discuss a practical heuristic based on our theoretical approach and discuss its empirical behavior.


Sign in / Sign up

Export Citation Format

Share Document