FST and the Triangle Inequality for Biallelic Markers
AbstractThe population differentiation statistic FST, introduced by Sewall Wright, is often treated as a pairwise distance measure between populations. As was known to Wright, however, FST is not a true metric because allele frequencies exist for which it does not satisfy the triangle inequality. We prove that a stronger result holds: for biallelic markers whose allele frequencies differ across three populations, FST never satisfies the triangle inequality. We study the deviation from the triangle inequality as a function of the allele frequencies of three populations, identifying frequency vectors at which the deviation is maximal. We also examine the implications of the failure of the triangle inequality for the four-point condition for groups of four populations. Next, we examine the extent to which FST fails to satisfy the triangle inequality in genome-wide data from human populations, finding that some loci have frequencies that produce deviations near the maximum. We discuss the consequences of the theoretical results for various types of data analysis, including multidimensional scaling and inference of neighbor-joining trees from pairwise FST matrices.