scholarly journals Estimation of Hominoid Ancestral Population Sizes under Bayesian Coalescent Models Incorporating Mutation Rate Variation and Sequencing Errors

2008 ◽  
Vol 25 (9) ◽  
pp. 1979-1994 ◽  
Author(s):  
Ralph Burgess ◽  
Ziheng Yang
2016 ◽  
Author(s):  
Thomas C A Smith ◽  
Antony M Carr ◽  
Adam C Eyre-Walker

Across indepedent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either i) drivers of cancer that are postively selected during oncogenesis, ii) due to mutation rate variation, or iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likly to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ~4% of all SNVs are error in this dataset, but that the rate of error varies by thousands-of-fold.


2016 ◽  
Author(s):  
Thomas C A Smith ◽  
Antony M Carr ◽  
Adam C Eyre-Walker

Across indepedent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either i) drivers of cancer that are postively selected during oncogenesis, ii) due to mutation rate variation, or iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likly to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ~4% of all SNVs are error in this dataset, but that the rate of error varies by thousands-of-fold.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2391 ◽  
Author(s):  
Thomas C.A. Smith ◽  
Antony M. Carr ◽  
Adam C. Eyre-Walker

Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.


1997 ◽  
Vol 69 (2) ◽  
pp. 111-116 ◽  
Author(s):  
ZIHENG YANG

The theory developed by Takahata and colleagues for estimating the effective population size of ancestral species using homologous sequences from closely related extant species was extended to take account of variation of evolutionary rates among loci. Nuclear sequence data related to the evolution of modern humans were reanalysed and computer simulations were performed to examine the effect of rate variation on estimation of ancestral population sizes. It is found that the among-locus rate variation does not have a significant effect on estimation of the current population size when sequences from multiple loci are sampled from the same species, but does have a significant effect on estimation of the ancestral population size using sequences from different species. The effects of ancestral population size, species divergence time and among-locus rate variation are found to be highly correlated, and to achieve reliable estimates of the ancestral population size, effects of the other two factors should be estimated independently.


1994 ◽  
Vol 8 (2) ◽  
pp. 162-170 ◽  
Author(s):  
Darren G. Monckton ◽  
Rita Neumann ◽  
Tara Guram ◽  
Neale Fretwell ◽  
Keiji Tamaki ◽  
...  

2007 ◽  
Vol 8 (11) ◽  
pp. 902-902
Author(s):  
Charles F. Baer ◽  
Michael M. Miyamoto ◽  
Dee R. Denver

Sign in / Sign up

Export Citation Format

Share Document