scholarly journals MuClone: Somatic mutation detection and classification through probabilistic integration of clonal population structure

2017 ◽  
Author(s):  
Fatemeh Dorri ◽  
Sean Jewell ◽  
Alexandre Bouchard-Côté ◽  
Sohrab P. Shah

AbstractAccurate detection and classification of somatic single nucleotide variants (SNVs) is important in defining the clonal composition of human cancers. Existing tools are prone to miss low prevalence mutations and methods for classification of mutations into clonal groups across the whole genome are underdeveloped. Increasing interest in deciphering clonal population dynamics over multiple samples in time or anatomic space from the same patient is resulting in whole genome sequence (WGS) data from phylogenetically related samples. With the access to this data, we posited that injecting clonal structure information into the inference of mutations from multiple samples would improve mutation detection.We developed MuClone: a novel statistical framework for simultaneous detection and classification of mutations across multiple tumour samples of a patient from whole genome or exome sequencing data. The key advance lies in incorporating prior knowledge about the cellular prevalences of clones to improve the performance of detecting mutations, particularly low prevalence mutations. We evaluated MuClone through synthetic and real data from spatially sampled ovarian cancers. Results support the hypothesis that clonal information improves sensitivity in detecting somatic mutations without compromising specificity. In addition, MuClone classifies mutations across whole genomes of multiple samples into biologically meaningful groups, providing additional phylogenetic insights and enhancing the study of WGS-derived clonal dynamics.

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Matthew J. Meier ◽  
Marc A. Beal ◽  
Andrew Schoenrock ◽  
Carole L. Yauk ◽  
Francesco Marchetti

Abstract The MutaMouse transgenic rodent model is widely used for assessing in vivo mutagenicity. Here, we report the characterization of MutaMouse’s whole genome sequence and its genetic variants compared to the C57BL/6 reference genome. High coverage (>50X) next-generation sequencing (NGS) of whole genomes from multiple MutaMouse animals from the Health Canada (HC) colony showed ~5 million SNVs per genome, ~20% of which are putatively novel. Sequencing of two animals from a geographically separated colony at Covance indicated that, over the course of 23 years, each colony accumulated 47,847 (HC) and 17,677 (Covance) non-parental homozygous single nucleotide variants. We found no novel nonsense or missense mutations that impair the MutaMouse response to genotoxic agents. Pairing sequencing data with array comparative genomic hybridization (aCGH) improved the accuracy and resolution of copy number variants (CNVs) calls and identified 300 genomic regions with CNVs. We also used long-read sequence technology (PacBio) to show that the transgene integration site involved a large deletion event with multiple inversions and rearrangements near a retrotransposon. The MutaMouse genome gives important genetic context to studies using this model, offers insight on the mechanisms of structural variant formation, and contributes a framework to analyze aCGH results alongside NGS data.


Author(s):  
Viola Kurm ◽  
Ilse Houwers ◽  
Claudia E. Coipan ◽  
Peter Bonants ◽  
Cees Waalwijk ◽  
...  

AbstractIdentification and classification of members of the Ralstonia solanacearum species complex (RSSC) is challenging due to the heterogeneity of this complex. Whole genome sequence data of 225 strains were used to classify strains based on average nucleotide identity (ANI) and multilocus sequence analysis (MLSA). Based on the ANI score (>95%), 191 out of 192(99.5%) RSSC strains could be grouped into the three species R. solanacearum, R. pseudosolanacearum, and R. syzygii, and into the four phylotypes within the RSSC (I,II, III, and IV). R. solanacearum phylotype II could be split in two groups (IIA and IIB), from which IIB clustered in three subgroups (IIBa, IIBb and IIBc). This division by ANI was in accordance with MLSA. The IIB subgroups found by ANI and MLSA also differed in the number of SNPs in the primer and probe sites of various assays. An in-silico analysis of eight TaqMan and 11 conventional PCR assays was performed using the whole genome sequences. Based on this analysis several cases of potential false positives or false negatives can be expected upon the use of these assays for their intended target organisms. Two TaqMan assays and two PCR assays targeting the 16S rDNA sequence should be able to detect all phylotypes of the RSSC. We conclude that the increasing availability of whole genome sequences is not only useful for classification of strains, but also shows potential for selection and evaluation of clade specific nucleic acid-based amplification methods within the RSSC.


2019 ◽  
Vol 96 (2) ◽  
pp. 106-109
Author(s):  
Jayshree Dave ◽  
John Paul ◽  
Thomas Joshua Pasvol ◽  
Andy Williams ◽  
Fiona Warburton ◽  
...  

ObjectiveWe aimed to characterise gonorrhoea transmission patterns in a diverse urban population by linking genomic, epidemiological and antimicrobial susceptibility data.MethodsNeisseria gonorrhoeae isolates from patients attending sexual health clinics at Barts Health NHS Trust, London, UK, during an 11-month period underwent whole-genome sequencing and antimicrobial susceptibility testing. We combined laboratory and patient data to investigate the transmission network structure.ResultsOne hundred and fifty-eight isolates from 158 patients were available with associated descriptive data. One hundred and twenty-nine (82%) patients identified as male and 25 (16%) as female; four (3%) records lacked gender information. Self-described ethnicities were: 51 (32%) English/Welsh/Scottish; 33 (21%) white, other; 23 (15%) black British/black African/black, other; 12 (8%) Caribbean; 9 (6%) South Asian; 6 (4%) mixed ethnicity; and 10 (6%) other; data were missing for 14 (9%). Self-reported sexual orientations were 82 (52%) men who have sex with men (MSM); 49 (31%) heterosexual; 2 (1%) bisexual; data were missing for 25 individuals. Twenty-two (14%) patients were HIV positive. Whole-genome sequence data were generated for 151 isolates, which linked 75 (50%) patients to at least one other case. Using sequencing data, we found no evidence of transmission networks related to specific ethnic groups (p=0.64) or of HIV serosorting (p=0.35). Of 82 MSM/bisexual patients with sequencing data, 45 (55%) belonged to clusters of ≥2 cases, compared with 16/44 (36%) heterosexuals with sequencing data (p=0.06).ConclusionWe demonstrate links between 50% of patients in transmission networks using a relatively small sample in a large cosmopolitan city. We found no evidence of HIV serosorting. Our results do not support assortative selectivity as an explanation for differences in gonorrhoea incidence between ethnic groups.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5895 ◽  
Author(s):  
Thomas Andreas Kohl ◽  
Christian Utpatel ◽  
Viola Schleusener ◽  
Maria Rosaria De Filippo ◽  
Patrick Beckert ◽  
...  

Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.


Heredity ◽  
2020 ◽  
Vol 124 (5) ◽  
pp. 658-674 ◽  
Author(s):  
Mahmoud Amiri Roudbar ◽  
Mohammad Reza Mohammadabadi ◽  
Ahmad Ayatollahi Mehrgardi ◽  
Rostam Abdollahi-Arpanahi ◽  
Mehdi Momen ◽  
...  

2015 ◽  
Vol 3 (6) ◽  
Author(s):  
Phuong N. Tran ◽  
Nicholas E. H. Tan ◽  
Yin Peng Lee ◽  
Han Ming Gan ◽  
Steven J. Polter ◽  
...  

Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy ( Toxicodendron radicans ) vine tissue. Five bacteria belong to the genus Pseudomonas , and six single members from other genera were found present in interior vine tissue of poison ivy.


2013 ◽  
Vol 63 (Pt_7) ◽  
pp. 2742-2751 ◽  
Author(s):  
Henryk Urbanczyk ◽  
Yoshitoshi Ogura ◽  
Tetsuya Hayashi

Use of inadequate methods for classification of bacteria in the so-called Harveyi clade (family Vibrionaceae, Gammaproteobacteria) has led to incorrect assignment of strains and proliferation of synonymous species. In order to resolve taxonomic ambiguities within the Harveyi clade and to test usefulness of whole genome sequence data for classification of Vibrionaceae, draft genome sequences of 12 strains were determined and analysed. The sequencing included type strains of seven species: Vibrio sagamiensis NBRC 104589T, Vibrio azureus NBRC 104587T, Vibrio harveyi NBRC 15634T, Vibrio rotiferianus LMG 21460T, Vibrio campbellii NBRC 15631T, Vibrio jasicida LMG 25398T, and Vibrio owensii LMG 25443T. Draft genome sequences of strain LMG 25430, previously designated the type strain of [Vibrio communis], and two strains (MWB 21 and 090810c) from the ‘beijerinckii’ lineage were also determined. Whole genomes of two additional strains (ATCC 25919 and 200612B) that previously could not be assigned to any Harveyi clade species were also sequenced. Analysis of the genome sequence data revealed a clear case of synonymy between V. owensii and [V. communis], confirming an earlier proposal to synonymize both species. Both strains from the ‘beijerinckii’ lineage were classified as V. jasicida, while the strains ATCC 25919 and 200612B were classified as V. owensii and V. campbellii, respectively. We also found that two strains, AND4 and Ex25, are closely related to Harveyi clade bacteria, but could not be assigned to any species of the family Vibrionaceae. The use of whole genome sequence data for the taxonomic classification of the Harveyi clade bacteria and other members of the family Vibrionaceae is also discussed.


2021 ◽  
Author(s):  
Jiru Han ◽  
Jacob E Munro ◽  
Anthony Kocoski ◽  
Alyssa E Barry ◽  
Melanie Bahlo

Short tandem repeats (STRs) are highly informative genetic markers that have been used extensively in population genetics analysis. They are an important source of genetic diversity and can also have functional impact. Despite the availability of bioinformatic methods that permit large-scale genome-wide genotyping of STRs from whole genome sequencing data, they have not previously been applied to sequencing data from large collections of malaria parasite field samples. Here, we have genotyped STRs using HipSTR in more than 3,000 Plasmodium falciparum and 174 Plasmodium vivax published whole-genome sequence data from samples collected across the globe. High levels of noise and variability in the resultant callset necessitated the development of a novel method for quality control of STR genotype calls. A set of high-quality STR loci (6,768 from P. falciparum and 3,496 from P. vivax) were used to study Plasmodium genetic diversity, population structures and genomic signatures of selection and these were compared to genome-wide single nucleotide polymorphism (SNP) genotyping data. In addition, the genome-wide information about genetic variation and other characteristics of STRs in P. falciparum and P. vivax have been made available in an interactive web-based R Shiny application PlasmoSTR (https://github.com/bahlolab/PlasmoSTR).


2021 ◽  
Author(s):  
Katherine M. D'Amico-Willman ◽  
Wilberforce Z. Ouma ◽  
Tea Meulia ◽  
Gina M. Sideli ◽  
Thomas M. Gradziel ◽  
...  

Almond (Prunus dulcis [Mill.] D.A. Webb) is an economically important, specialty nut crop grown almost exclusively in the United States. Breeding and improvement efforts worldwide have led to the development of key, productive cultivars, including Nonpareil, which is the most widely grown almond cultivar. Thus far, genomic resources for this species have been limited, and a whole-genome assembly for Nonpareil is not currently available despite its economic importance and use in almond breeding worldwide. We generated a 615.89X coverage genome sequence using Illumina, PacBio, and optical mapping technologies. Gene prediction revealed 27,487 genes using MinION Oxford nanopore and Illumina RNA sequencing, and genome annotation found that 68% of predicted models are associated with at least one biological function. Further, epigenetic signatures of almond, namely DNA cytosine methylation, have been implicated in a variety of phenotypes including self-compatibility, bud dormancy, and development of non-infectious bud failure. In addition to the genome sequence and annotation, this report also provides the complete methylome of several key almond tissues, including leaf, flower, endocarp, mesocarp, fruit skin, and seed coat. Comparisons between methylation profiles in these tissues revealed differences in genome-wide weighted percent methylation and chromosome-level methylation enrichment. The raw sequencing data are available on NCBI Sequence Read Archive, and the complete genome sequence and annotation files are available on NCBI Genbank. All data can be used without restriction.


2021 ◽  
Author(s):  
Masako Ichikawa ◽  
Norio Kato ◽  
Erika Toda ◽  
Masakazu Kashihara ◽  
Yuji Ishida ◽  
...  

AbstractSomaclonal variation was studied by whole-genome sequencing in rice plants (Oryza sativa L., ‘Nipponbare’) regenerated from the zygotes, mature embryos, and immature embryos of a single mother plant. The mother plant and its seed-propagated progeny were also sequenced. A total of 338 variants of the mother plant sequence were detected in the progeny, and mean values ranged from 9.0 of the seed-propagated plants to 37.4 of regenerants from mature embryos. The ratio of single nucleotide variants among the variants was 74.3%, and the natural mutation rate calculated using the variants in the seed-propagated plants was 1.2 × 10−8. The percentage and the mutation rate were consistent with the values reported previously. Plants regenerated from mature embryos had significantly more variants than different progeny types. Therefore, using zygotes and immature embryos can reduce somaclonal variation during the genetic manipulation of rice.


Sign in / Sign up

Export Citation Format

Share Document