scholarly journals Uneven Missing Data Skew Phylogenomic Relationships within the Lories and Lorikeets

2020 ◽  
Vol 12 (7) ◽  
pp. 1131-1147
Author(s):  
Brian Tilston Smith ◽  
William M Mauck ◽  
Brett W Benz ◽  
Michael J Andersen

Abstract The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.

2018 ◽  
Author(s):  
Brian Tilston Smith ◽  
William M. Mauck ◽  
Brett Benz ◽  
Michael J. Andersen

AbstractThe resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded and fragmented, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern sample types impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage sites had several clades where historical or modern samples clustered together, which were not observed in trees with more stringent filtering. To assess if the aberrant relationships were affected by missing data, we performed a targeted outlier analysis of sites and loci and a more general data reduction approach where we excluded sites based on a percentage of data completeness. The outlier analyses showed that 6.6% of total sites were driving the topological differences among trees built with and without low coverage sites, and at these sites, historical samples had 7.5x more missing data than modern ones. An examination of subclades identified loci biased by missing data, and the exclusion of these loci shifted phylogenetic relationships. Predictive modeling found that outlier analysis scores were not correlated with summary statistics of locus alignments, indicating that outlier loci do not have characteristics differing from other loci. Excluding missing data by percentage completeness indicated that sites with 70% completeness were necessary to avoid spurious relationships, but more stringent conditions of data completeness produced less-resolved trees. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.


2011 ◽  
Vol 44 (4) ◽  
pp. 865-872 ◽  
Author(s):  
Ludmila Urzhumtseva ◽  
Alexandre Urzhumtsev

Crystallographic Fourier maps may contain barely interpretable or non-interpretable regions if these maps are calculated with an incomplete set of diffraction data. Even a small percentage of missing data may be crucial if these data are distributed non-uniformly and form connected regions of reciprocal space. Significant time and effort can be lost trying to interpret poor maps, in improving them by phase refinement or in fighting against artefacts, whilst the problem could in fact be solved by completing the data set. To characterize the distribution of missing reflections, several types of diagrams have been suggested in addition to the usual plots of completeness in resolution shells and cumulative data completeness. A computer program,FOBSCOM, has been developed to analyze the spatial distribution of unmeasured diffraction data, to search for connected regions of unmeasured reflections and to obtain numeric characteristics of these regions. By performing this analysis, the program could help to save time during structure solution for a number of projects. It can also provide information about a possible overestimation of the map quality and model-biased features when calculated values are used to replace unmeasured data.


2020 ◽  
Vol 173 (1) ◽  
pp. 21-33 ◽  
Author(s):  
Lu Yao ◽  
Kelsey Witt ◽  
Hongjie Li ◽  
Jonathan Rice ◽  
Nelson R. Salinas ◽  
...  

PLoS ONE ◽  
2014 ◽  
Vol 9 (5) ◽  
pp. e96793 ◽  
Author(s):  
Mandy Man-Ying Tin ◽  
Evan Philip Economo ◽  
Alexander Sergeyevich Mikheyev

Author(s):  
Jimmy A McGuire ◽  
Darko D Cotoras ◽  
Brendan O'Connell ◽  
Shobi Z S Lawalata ◽  
Cynthia Y Wang-Claypool ◽  
...  

We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus. Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (547 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4470 ◽  
Author(s):  
Jimmy A. McGuire ◽  
Darko D. Cotoras ◽  
Brendan O’Connell ◽  
Shobi Z.S. Lawalata ◽  
Cynthia Y. Wang-Claypool ◽  
...  

We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus. Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (721 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.


2015 ◽  
Author(s):  
Logan Kistler ◽  
Oliver Smith ◽  
Roselyn Ware ◽  
Garry Momber ◽  
Richard Bates ◽  
...  

Recently, the finding of 8,000 year old wheat DNA from submerged marine sediments (1) was challenged on the basis of a lack of signal of cytosine deamination relative to three other data sets generated from young samples of herbarium and museum specimens, and a 7,000 year old human skeleton preserved in a cave environment (2). The study used a new approach for low coverage data sets to which tools such as mapDamage cannot be applied to infer chemical damage patterns. Here we show from the analysis of 148 palaeogenomic data sets that the rate of cytosine deamination is a thermally correlated process, and that organellar generally shows higher rates of deamination than nuclear DNA in comparable environments. We categorize four clusters of deamination rates (alpha,beta,gamma,epsilon) that are associated with cold stable environments, cool but thermally fluctuating environments, and progressively warmer environments. These correlations show that the expected level of deamination in the sedaDNA would be extremely low. The low coverage approach to detect DNA damage by Weiss et al. (2) fails to identify damage samples from the cold class of deamination rates. Finally, different enzymes used in library preparation processes exhibit varying capability in reporting cytosine deamination damage in the 5 prime region of fragments. The PCR enzyme used in the sedaDNA study would not have had the capability to report 5 prime cytosine deamination, as they do not read over uracil residues, and signatures of damage would have better been sought at the 3 prime end. The 8,000 year old sedaDNA matches both the thermal age prediction of fragmentation, and the expected level of cytosine deamination for the preservation environment. Given these facts and the use of rigorous controls these data meet the criteria of authentic ancient DNA to an extremely stringent level.


Author(s):  
Jimmy A McGuire ◽  
Darko D Cotoras ◽  
Brendan O'Connell ◽  
Shobi Z S Lawalata ◽  
Cynthia Y Wang-Claypool ◽  
...  

We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus. Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (547 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.


mSystems ◽  
2016 ◽  
Vol 1 (3) ◽  
Author(s):  
Richard Allen White ◽  
Eric M. Bottos ◽  
Taniya Roy Chowdhury ◽  
Jeremy D. Zucker ◽  
Colin J. Brislawn ◽  
...  

ABSTRACT Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “Candidatus Pseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundance Acidobacteria were highly transcriptionally active, whereas bins corresponding to high-relative-abundance Verrucomicrobia were not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCE Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: An author video summary of this article is available.


2018 ◽  
Vol 5 (1) ◽  
pp. 171089 ◽  
Author(s):  
Madeline S. Tiee ◽  
Ryan J. Harrigan ◽  
Henri A. Thomassen ◽  
Thomas B. Smith

Infectious diseases that originate from multiple wildlife hosts can be complex and problematic to manage. A full understanding is further limited by large temporal and spatial gaps in sampling. However, these limitations can be overcome, in part, by using historical samples, such as those derived from museum collections. Here, we screened over 1000 museum specimens collected over the past 120 years to examine the historical distribution and prevalence of monkeypox virus (MPXV) in five species of African rope squirrel ( Funisciurus sp.) collected across Central Africa. We found evidence of MPXV infections in host species as early as 1899, half a century earlier than the first recognized case of MPXV in 1958, supporting the suggestion that historic pox-like outbreaks in humans and non-human primates may have been caused by MPXV rather than smallpox as originally thought. MPX viral DNA was found in 93 of 1038 (9.0%) specimens from five Funisciurus species ( F. anerythrus , F. carruthersi , F. congicus , F. lemniscatus and F. pyrropus ), of which F. carruthersi and pyrropus had not previously been identified as potential MPXV hosts. We additionally documented relative prevalence rates of infection in museum specimens of Funisciurus and examined the spatial and temporal distribution of MPXV in these potential host species across nearly a hundred years (1899–1993).


Sign in / Sign up

Export Citation Format

Share Document