International Barcode of Life: Focus on big biodiversity in South Africa

Genome ◽  
2017 ◽  
Vol 60 (11) ◽  
pp. 875-879 ◽  
Author(s):  
Sarah J. Adamowicz ◽  
Peter M. Hollingsworth ◽  
Sujeevan Ratnasingham ◽  
Michelle van der Bank

Participants in the 7th International Barcode of Life Conference (Kruger National Park, South Africa, 20–24 November 2017) share the latest findings in DNA barcoding research and its increasingly diversified applications. Here, we review prevailing trends synthesized from among 429 invited and contributed abstracts, which are collated in this open-access special issue of Genome. Hosted for the first time on the African continent, the 7th Conference places special emphasis on the evolutionary origins, biogeography, and conservation of African flora and fauna. Within Africa and elsewhere, DNA barcoding and related techniques are being increasingly used for wildlife forensics and for the validation of commercial products, such as medicinal plants and seafood species. A striking trend of the conference is the dramatic rise of studies on environmental DNA (eDNA) and on diverse uses of high-throughput sequencing techniques. Emerging techniques in these areas are opening new avenues for environmental biomonitoring, managing species-at-risk and invasive species, and revealing species interaction networks in unprecedented detail. Contributors call for the development of validated community standards for high-throughput sequence data generation and analysis, to enable the full potential of these methods to be realized for understanding and managing biodiversity on a global scale.

2018 ◽  
Vol 50 (3) ◽  
pp. 299-312 ◽  
Author(s):  
Steven D. LEAVITT ◽  
Paul M. KIRIKA ◽  
Guillermo AMO DE PAZ ◽  
Jen-Pan HUANG ◽  
Jae-Seoun HUR ◽  
...  

AbstractSpecies richness is not evenly distributed across the tree of life and a limited number of lineages comprise an extraordinarily large number of species. In lichen-forming fungi, only two genera are known to be ‘ultradiverse’ (>500 species), with the most diverse genus, Xanthoparmelia, consisting of c. 820 species. While Australia and South Africa are known as current centres of diversity for Xanthoparmelia, it is not well known when and where this massive diversity arose. To better understand the geographical and temporal context of diversification in this diverse genus, we sampled 191 Xanthoparmelia specimens representing c. 124 species/species-level lineages from populations worldwide. From these specimens, we generated a multi-locus sequence data set using Sanger and high-throughput sequencing to reconstruct evolutionary relationships in Xanthoparmelia, estimate divergence times and reconstruct biogeographical histories in a maximum likelihood and Bayesian framework. This study corroborated the phylogenetic placement of several morphologically or chemically diverse taxa within Xanthoparmelia, such as Almbornia, Chondropsis, Karoowia, Namakwa, Neofuscelia, Omphalodiella, Paraparmelia, Placoparmelia and Xanthomaculina, in addition to improved phylogenetic resolution and reconstruction of previously unsampled lineages within Xanthoparmelia. Our data indicate that Xanthoparmelia most likely originated in Africa during the early Miocene, coinciding with global aridification and development of open habitats. Reconstructed biogeographical histories of Xanthoparmelia reveal diversification restricted to continents with infrequent intercontinental exchange by long-distance dispersal. While likely mechanisms by which Xanthoparmelia obtained strikingly high levels of species richness in Australia and South Africa remain uncertain, this study provides a framework for ongoing research into diverse lineages of lichen-forming fungi. Finally, our study highlights a novel approach for generating locus-specific molecular sequence data sets from high throughput metagenomic reads.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Yuka Torii ◽  
Kazuhiro Horiba ◽  
Satoshi Hayano ◽  
Taichi Kato ◽  
Takako Suzuki ◽  
...  

Abstract Background Kawasaki disease (KD) is an idiopathic systemic vasculitis that predominantly damages coronary arteries in children. Various pathogens have been investigated as triggers for KD, but no definitive causative pathogen has been determined. As KD is diagnosed by symptoms, several days are needed for diagnosis. Therefore, at the time of diagnosis of KD, the pathogen of the trigger may already be diminished. The aim of this study was to explore comprehensive pathogens in the sera at the acute stage of KD using high-throughput sequencing (HTS). Methods Sera of 12 patients at an extremely early stage of KD and 12 controls were investigated. DNA and RNA sequences were read separately using HTS. Sequence data were imported into the home-brew meta-genomic analysis pipeline, PATHDET, to identify the pathogen sequences. Results No RNA virus reads were detected in any KD case except for that of equine infectious anemia, which is known as a contaminant of commercial reverse transcriptase. Concerning DNA viruses, human herpesvirus 6B (HHV-6B, two cases) and Anelloviridae (eight cases) were detected among KD cases as well as controls. Multiple bacterial reads were obtained from KD and controls. Bacteria of the genera Acinetobacter, Pseudomonas, Delfita, Roseomonas, and Rhodocyclaceae appeared to be more common in KD sera than in the controls. Conclusion No single pathogen was identified in serum samples of patients at the acute phase of KD. With multiple bacteria detected in the serum samples, it is difficult to exclude the possibility of contamination; however, it is possible that these bacteria might stimulate the immune system and induce KD.


2017 ◽  
Author(s):  
Gregory L. Owens ◽  
Marco Todesco ◽  
Emily B. M. Drummond ◽  
Sam Yeaman ◽  
Loren H. Rieseberg

AbstractHigh throughput sequencing using the Illumina HiSeq platform is a pervasive and critical molecular ecology resource, and has provided the data underlying many recent advances. A recent study has suggested that ‘index switching’, where reads are misattributed to the wrong sample, may be higher in new versions of the HiSeq platform. This has the potential to invalidate both published and in-progress work across the field. Here, we test for evidence of index switching in an exemplar whole genome shotgun dataset sequenced on both the Illumina HiSeq 2500, which should not have the problem, and the Illumina HiSeq X, which may. We leverage unbalanced heterozygotes, which may be produced by index switching, and ask whether the under-sequenced allele is more likely to be found in other samples in the same lane than expected based on the allele frequency. Although we validate the sensitivity of this method using simulations, we find that neither the HiSeq 2500 nor the HiSeq X have evidence of index switching. This suggests that, thankfully, index switching may not be a ubiquitous problem in HiSeq X sequence data. Lastly, we provide scripts for applying our method so that index switching can be tested for in other datasets.


2021 ◽  
Author(s):  
Rafaela S. Fontenele ◽  
Simona Kraberger ◽  
James Hadfield ◽  
Erin M. Driver ◽  
Devin Bowes ◽  
...  

AbstractSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged from a zoonotic spill-over event and has led to a global pandemic. The public health response has been predominantly informed by surveillance of symptomatic individuals and contact tracing, with quarantine, and other preventive measures have then been applied to mitigate further spread. Non-traditional methods of surveillance such as genomic epidemiology and wastewater-based epidemiology (WBE) have also been leveraged during this pandemic. Genomic epidemiology uses high-throughput sequencing of SARS-CoV-2 genomes to inform local and international transmission events, as well as the diversity of circulating variants. WBE uses wastewater to analyse community spread, as it is known that SARS-CoV-2 is shed through bodily excretions. Since both symptomatic and asymptomatic individuals contribute to wastewater inputs, we hypothesized that the resultant pooled sample of population-wide excreta can provide a more comprehensive picture of SARS-CoV-2 genomic diversity circulating in a community than clinical testing and sequencing alone. In this study, we analysed 91 wastewater samples from 11 states in the USA, where the majority of samples represent Maricopa County, Arizona (USA). With the objective of assessing the viral diversity at a population scale, we undertook a single-nucleotide variant (SNV) analysis on data from 52 samples with >90% SARS-CoV-2 genome coverage of sequence reads, and compared these SNVs with those detected in genomes sequenced from clinical patients. We identified 7973 SNVs, of which 5680 were “novel” SNVs that had not yet been identified in the global clinical-derived data as of 17th June 2020 (the day after our last wastewater sampling date). However, between 17th of June 2020 and 20th November 2020, almost half of the SNVs have since been detected in clinical-derived data. Using the combination of SNVs present in each sample, we identified the more probable lineages present in that sample and compared them to lineages observed in North America prior to our sampling dates. The wastewater-derived SARS-CoV-2 sequence data indicates there were more lineages circulating across the sampled communities than represented in the clinical-derived data. Principal coordinate analyses identified patterns in population structure based on genetic variation within the sequenced samples, with clear trends associated with increased diversity likely due to a higher number of infected individuals relative to the sampling dates. We demonstrate that genetic correlation analysis combined with SNVs analysis using wastewater sampling can provide a comprehensive snapshot of the SARS-CoV-2 genetic population structure circulating within a community, which might not be observed if relying solely on clinical cases.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Adetayo Emmanuel Obasa ◽  
Anoop T. Ambikan ◽  
Soham Gupta ◽  
Ujjwal Neogi ◽  
Graeme Brendon Jacobs

Abstract Background HIV-1C has been shown to have a greater risk of virological failure and reduced susceptibility towards boosted protease inhibitors (bPIs), a component of second-line combination antiretroviral therapy (cART) in South Africa. This study entailed an evaluation of HIV-1 drug resistance-associated mutations (RAMs) among minor viral populations through high-throughput sequencing genotypic resistance testing (HTS-GRT) in patients on the South African national second-line cART regimen receiving bPIs. Methods During 2017 and 2018, 67 patient samples were sequenced using high-throughput sequencing (HTS), of which 56 samples were included in the final analysis because the patient’s treatment regimen was available at the time of sampling. All patients were receiving bPIs as part of their cART. Viral RNA was extracted, and complete pol genes were amplified and sequenced using Illumina HiSeq2500, followed by bioinformatics analysis to quantify the RAMs according to the Stanford HIV Drug Resistance Database. Results Statistically significantly higher PI RAMs were observed in minor viral quasispecies (25%; 14/56) compared to non-nucleoside reverse transcriptase inhibitors (9%; 5/56; p = 0.042) and integrase inhibitor RAM (4%; 2/56; p = 0.002). The majority of the drug resistance mutations in the minor viral quasispecies were observed in the V82A mutation (n = 13) in protease and K65R (n = 5), K103N (n = 7) and M184V (n = 5) in reverse transcriptase. Conclusions HTS-GRT improved the identification of PI and reverse transcriptase inhibitor (RTI) RAMs in second-line cART patients from South Africa compared to the conventional GRT with ≥20% used in Sanger-based sequencing. Several RTI RAMs, such as K65R, M184V or K103N and PI RAM V82A, were identified in < 20% of the population. Deep sequencing could be of greater value in detecting acquired resistance mutations early.


2018 ◽  
Author(s):  
Johanna B. Holm ◽  
Michael S. Humphrys ◽  
Courtney K. Robinson ◽  
Matthew L. Settles ◽  
Sandra Ott ◽  
...  

AbstractAmplification, sequencing and analysis of the 16S rRNA gene affords characterization of microbial community composition. As this tool has become more popular and amplicon-sequencing applications have grown in the total number of samples, growth in sample multiplexing is becoming necessary while maintaining high sequence quality and sequencing depth. Here, modifications to the Illumina HiSeq 2500 platform are described which produce greater multiplexing capabilities and 300 bp paired-end reads of higher quality than produced by the current Illumina MiSeq platform. To improve the feasibility and flexibility of this method, a 2-Step PCR amplification protocol is also described that allows for targeting of different amplicon regions, thus improving amplification success from low bacterial bioburden samples.ImportanceAmplicon sequencing has become a popular and widespread tool for surveying microbial communities. Lower overall costs associated with high throughput sequencing have made it a widely-adopted approach, especially for projects which necessitate sample multiplexing to eliminate batch effect and reduced time to acquire data. The method for amplicon sequencing on the Illumina HiSeq 2500 platform described here provides improved multiplexing capabilities while simultaneously producing greater quality sequence data and lower per sample cost relative to the Illumina MiSeq platform, without sacrificing amplicon length. To make this method more flexible to various amplicon targeted regions as well as improve amplification from low biomass samples, we also present and validate a 2-Step PCR library preparation method.


2018 ◽  
Author(s):  
Devika Ganesamoorthy ◽  
Minh Duc Cao ◽  
Tania Duarte ◽  
Wenhan Chen ◽  
Lachlan Coin

ABSTRACTBackgroundTandem repeats comprise significant proportion of the human genome including coding and regulatory regions. They are highly prone to repeat number variation and nucleotide mutation due to their repetitive and unstable nature, making them a major source of genomic variation between individuals. Despite recent advances in high throughput sequencing, analysis of tandem repeats in the context of complex diseases is still hindered by technical limitations.MethodsWe report a novel targeted sequencing approach, which allows simultaneous analysis of hundreds of repeats. We developed a Bayesian algorithm, namely – GtTR - which combines information from a reference long-read dataset with a short read counting approach to genotype tandem repeats at population scale. PCR sizing analysis was used for validation.ResultsWe used a PacBio long-read sequenced sample to generate a reference tandem repeat genotype dataset with on average 13% absolute deviation from PCR sizing results. Using this reference dataset GtTR generated estimates of VNTR copy number with accuracy within 95% high posterior density (HPD) intervals of 68% and 83% for capture sequence data and 200X WGS data respectively, improving to 87% and 94% with use of a PCR reference. We show that the genotype resolution increases as a function of depth, such that the median 95% HPD interval lies within 25%, 14%, 12% and 8% of the its midpoint copy number value for 30X, 200X WGS, 395X and 800X capture sequence data respectively. We validated nine targets by PCR sizing analysis and genotype estimates from sequencing results correlated well with PCR results.ConclusionsThe novel genotyping approach described here presents a new cost-effective method to explore previously unrecognized class of repeat variation in GWAS studies of complex diseases at the population level. Further improvements in accuracy can be obtained by improving accuracy of the reference dataset.


2017 ◽  
Author(s):  
Darrell O. Ricke ◽  
Anna Shcherbina ◽  
Adam Michaleas ◽  
Philip Fremont-Smith

AbstractHigh throughput DNA sequencing technologies enable improved characterization of forensic DNA samples enabling greater insights into DNA contributor(s). Current DNA forensics techniques rely upon allele sizing of short tandem repeats by capillary electrophoresis. High throughput sequencing enables forensic sample characterizations for large numbers of single nucleotide polymorphism loci. The slowest computational component of the DNA forensics analysis pipeline is the characterization of raw sequence data. This paper optimizes the SNP calling module of the DNA analysis pipeline with runtime results that scale linearly with the number of HTS sequences (patent pending)[1]. GrigoraSNPs can analyze 100 million reads in less than 5 minutes using 3 threads on a 4.0 GHz Intel i7-6700K laptop CPU.


Sign in / Sign up

Export Citation Format

Share Document