Control of artefactual variation in reported inter-sample relatedness during clinical use of a Mycobacterium tuberculosis sequencing pipeline

Mapping Intimacies ◽

10.1101/252460 ◽

2018 ◽

Author(s):

David H Wyllie ◽

Nicholas Sanderson ◽

Richard Myers ◽

Tim Peto ◽

Esther Robinson ◽

...

Keyword(s):

Consensus Sequence ◽

Read Depth ◽

Pairwise Distance ◽

Contact Tracing ◽

Clinical Samples ◽

Bacterial Dna ◽

Consensus Sequences ◽

Minor Variant ◽

Validation Set ◽

Genomic Regions

ABSTRACTContact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics.

Download Full-text

Control of Artifactual Variation in Reported Intersample Relatedness during Clinical Use of a Mycobacterium tuberculosis Sequencing Pipeline

Journal of Clinical Microbiology ◽

10.1128/jcm.00104-18 ◽

2018 ◽

Vol 56 (8) ◽

Cited By ~ 7

Author(s):

David H. Wyllie ◽

Nicholas Sanderson ◽

Richard Myers ◽

Tim Peto ◽

Esther Robinson ◽

...

Keyword(s):

Mycobacterium Tuberculosis ◽

Consensus Sequence ◽

Read Depth ◽

Pairwise Distance ◽

Contact Tracing ◽

Clinical Samples ◽

Bacterial Dna ◽

Content Type ◽

Minor Variant ◽

Genomic Regions

ABSTRACT Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artifactual variation between Mycobacterium tuberculosis isolates during routine next-generation sequencing of Mycobacterium spp., we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted and sequenced, reads were mapped, and consensus sequences were determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of nonmycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of nonmycobacterial bacterial DNA, we found significant increases in minor variant frequencies, of more than 1.5-fold, in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high-variation regions strongly influenced by the amount of nonmycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we demonstrated an approach identifying critical genomic regions contributing to clinically relevant artifactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multistep laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics.

Download Full-text

HAPHPIPE: Haplotype Reconstruction and Phylodynamics for Deep Sequencing of Intra-Host Viral Populations

Molecular Biology and Evolution ◽

10.1093/molbev/msaa315 ◽

2020 ◽

Author(s):

Matthew L Bendall ◽

Keylie M Gibson ◽

Margaret C Steiner ◽

Uzma Rentia ◽

Marcos Pérez-Losada ◽

...

Keyword(s):

Deep Sequencing ◽

De Novo ◽

Consensus Sequence ◽

Haplotype Reconstruction ◽

Consensus Sequences ◽

Genome Wide ◽

Genomic Regions ◽

Next Generation Sequencing Ngs ◽

Ngs Data ◽

Generation Sequencing

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.

Download Full-text

Unique k-mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling

International Journal of Molecular Sciences ◽

10.3390/ijms21030944 ◽

2020 ◽

Vol 21 (3) ◽

pp. 944 ◽

Cited By ~ 1

Author(s):

Valery V. Panyukov ◽

Sergey S. Kiselev ◽

Olga N. Ozoline

Keyword(s):

Distance Matrix ◽

Pairwise Distance ◽

Clinical Samples ◽

E Coli ◽

Taxonomic Profiling ◽

New Methods ◽

Natural Microflora ◽

Microbiome Profiling ◽

Pairwise Distance Matrix

The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn’s disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific “barcodes” for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.

Download Full-text

Shear stress induces hepatocyte PAI-1 gene expression through cooperative Sp1/Ets-1 activation of transcription

AJP Gastrointestinal and Liver Physiology ◽

10.1152/ajpgi.00467.2005 ◽

2006 ◽

Vol 291 (1) ◽

pp. G26-G34 ◽

Cited By ~ 30

Author(s):

Hideki Nakatsuka ◽

Takaaki Sokabe ◽

Kimiko Yamamoto ◽

Yoshinobu Sato ◽

Katsuyoshi Hatakeyama ◽

...

Keyword(s):

Gene Expression ◽

Shear Stress ◽

Consensus Sequence ◽

Early Gene ◽

Immediate Early ◽

Mrna Levels ◽

Consensus Sequences ◽

Static Conditions ◽

Stress Dependent ◽

Pai 1

Partial hepatectomy causes hemodynamic changes that increase portal blood flow in the remaining lobe, where the expression of immediate-early genes, including plasminogen activator inhibitor-1 (PAI-1), is induced. We hypothesized that a hyperdynamic circulatory state occurring in the remaining lobe induces immediate-early gene expression. In this study, we investigated whether the mechanical force generated by flowing blood, shear stress, induces PAI-1 expression in hepatocytes. When cultured rat hepatocytes were exposed to flow, PAI-1 mRNA levels began to increase within 3 h, peaked at levels significantly higher than the static control levels, and then gradually decreased. The flow-induced PAI-1 expression was shear stress dependent rather than shear rate dependent and accompanied by increased hepatocyte production of PAI-1 protein. Shear stress increased PAI-1 transcription but did not affect PAI-1 mRNA stability. Functional analysis of the 2.1-kb PAI-1 5′-promoter indicated that a 278-bp segment containing transcription factor Sp1 and Ets-1 consensus sequences was critical to the shear stress-dependent increase of PAI-1 transcription. Mutations of both the Sp1 and Ets-1 consensus sequences, but not of either one alone, markedly prevented basal PAI-1 transcription and abolished the response of the PAI-1 promoter to shear stress. EMSA and chromatin immunoprecipitation assays showed binding of Sp1 and Ets-1 to each consensus sequence under static conditions, which increased in response to shear stress. In conclusion, hepatocyte PAI-1 expression is flow sensitive and transcriptionally regulated by shear stress via cooperative interactions between Sp1 and Ets-1.

Download Full-text

The integration preference of Sleeping Beauty at non-TA site is related to the transposon end sequences

10.21203/rs.2.19101/v1 ◽

2019 ◽

Author(s):

Yiting Zhou ◽

Guangwei Ma ◽

Jiawen Yang ◽

Yabin Guo

Keyword(s):

Site Selection ◽

Genomic Dna ◽

Consensus Sequence ◽

Mouse Cell ◽

Sleeping Beauty ◽

Consensus Sequences ◽

Target Site Selection ◽

Target Sites ◽

End Sequences ◽

Selection Of

Abstract Background: Sleeping Beauty (SB) transposon had been thought to strictly integrate into TA dinucleotides. Recently, we found that SB also integrates into non-TA sites at a lower frequency. Here we performed further study on the non-TA integration of SB. Results: 1) SB can integrate into non-TA sites in HEK293T cells as well as in mouse cell lines. 2) Both the hyperactive transposase SB100X and the traditional SB11 catalyze integrations at non-TA sites. 3) The consensus sequence of the non-TA target sites only occur at the opposite side of the sequenced junction between transposon end and the genomic sequences, indicating that the integrations at non-TA sites are mainly aberrant integrations. 4) The consensus sequence of the non-TA target sites is corresponding to the transposon end sequence. When the transposon end sequence is mutated, the consensus sequences changed too. Conclusion: The interaction between the SB transposon end and genomic DNA may be involved in the target site selection of the SB integrations at non-TA sites.

Download Full-text

Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis

Nature Communications ◽

10.1038/ncomms10063 ◽

2015 ◽

Vol 6 (1) ◽

Cited By ~ 281

Author(s):

Phelim Bradley ◽

N. Claire Gordon ◽

Timothy M. Walker ◽

Laura Dunn ◽

Simon Heys ◽

...

Keyword(s):

Staphylococcus Aureus ◽

Mycobacterium Tuberculosis ◽

Sequence Data ◽

Error Rates ◽

Graph Representation ◽

Clinical Samples ◽

Resistant Bacteria ◽

Independent Validation ◽

Validation Set ◽

Sensitivity Specificity

Abstract The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package (‘Mykrobe predictor’) that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.

Download Full-text

N6-methyladenosine residues in an intron-specific region of prolactin pre-mRNA

Molecular and Cellular Biology ◽

10.1128/mcb.10.9.4456-4465.1990 ◽

1990 ◽

Vol 10 (9) ◽

pp. 4456-4465

Author(s):

S M Carroll ◽

P Narayan ◽

F M Rottman

Keyword(s):

Consensus Sequence ◽

Specific Sequence ◽

Free System ◽

Consensus Sequences ◽

Neplanocin A ◽

A Cell ◽

Steady State Levels ◽

Precursor Rna

N6-methyladenosine (m6A) residues occur at internal positions in most cellular and viral RNAs; both heterogeneous nuclear RNA and mRNA are involved. This modification arises by enzymatic transfer of a methyl group from S-adenosylmethionine to the central adenosine residue in the canonical sequence G/AAC. Thus far, m6A has been mapped to specific locations in eucaryotic mRNA and viral genomic RNA. We have now examined an intron-specific sequence of a modified bovine prolactin precursor RNA for the presence of this methylated nucleotide by using both transfected-cell systems and a cell-free system capable of methylating mRNA transcripts in vitro. The results indicate the final intron-specific sequence (intron D) of a prolactin RNA molecule does indeed possess m6A residues. When mapped to specific T1 oligonucleotides, the predominant site of methylation was found to be within the consensus sequence AGm6ACU. The level of m6A at this site is nonstoichiometric; approximately 24% of the molecules are modified in vivo. Methylation was detected at markedly reduced levels at other consensus sites within the intron but not in T1 oligonucleotides which do not contain either AAC or GAC consensus sequences. In an attempt to correlate mRNA methylation with processing, stably transfected CHO cells expressing augmented levels of bovine prolactin were treated with neplanocin A, an inhibitor of methylation. Under these conditions, the relative steady-state levels of the intron-containing nuclear precursor increased four to six times that found in control cells.

Download Full-text

Improving PSI-BLAST’s Fold Recognition Performance through Combining Consensus Sequences and Support Vector Machine

Interdisciplinary Research and Applications in Bioinformatics, Computational Biology, and Environmental Sciences - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-60960-064-8.ch005 ◽

2011 ◽

pp. 51-59

Author(s):

Ren-Xiang Yan ◽

Jing Liu ◽

Yi-Min Tao

Keyword(s):

Support Vector Machine ◽

Sequence Alignment ◽

Recognition Performance ◽

Consensus Sequence ◽

Early Time ◽

Fold Recognition ◽

Support Vector ◽

Sequence Information ◽

Consensus Sequences ◽

Profile Alignment

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.

Download Full-text

PCR Analysis of Nasal Polyps, Chronic Sinusitis, and Hypertrophied Turbinates for DNA Encoding Bacterial 16S rRNA

American Journal of Rhinology ◽

10.1177/194589240201600309 ◽

2002 ◽

Vol 16 (3) ◽

pp. 169-173 ◽

Cited By ~ 21

Author(s):

Gerald A Bucholtz ◽

Sherry A. Salzman ◽

Fernando B. Bersalona ◽

Timothy R. Boyle ◽

Victor S. Ejercito ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

Nasal Polyps ◽

Chronic Sinusitis ◽

Consensus Sequence ◽

Rrna Gene ◽

Dna Encoding ◽

Bacterial Dna ◽

Two Samples ◽

The 16S Rrna Gene

Background Nasal polyps are considered to result from chronic inflammation, but the initial or persisting stimulus for the inflammation is not known. A variety of bacteria and fungi have been cultured from nasal polyps, but ∼35% have sterile cultures. Previously, Mycoplasma pneumoniae–specific DNA was detected in human nasal polyps using polymerase chain reaction (PCR) techniques, suggesting M. pneumoniae as a causative agent in the etiology of nasal polyps. Methods In this study, we tested for the presence of bacterial DNA in nasal polyps resected from 40 patients, in nasal mucosa membrane from 9 patients undergoing turbinectomy for hypertrophy, and in sinus mucosa membrane from 6 patients undergoing endoscopic surgery for chronic sinusitis. Tissue DNA was extracted and analyzed by PCR using M. pneumoniae specific primers for DNA that encode the 16S rRNA gene in 41 specimens (31 polyps, 6 turbinates, and 4 sinus), and by consensus sequence-based PCR using broad range primers for most eubacterial DNA encoding the 16S rRNA gene in 38 specimens (26 polyps, 7 turbinates, and 5 sinuses). Results Only two samples were positive for bacterial DNA encoding 16S rRNA: Streptococcus sp. DNA was isolated from one polyp specimen and Pseudomonas aeruginosa DNA was isolated in one maxillary sinusitis specimen. No evidence of M. pneumoniae–specific DNA encoding 16S rRNA was found in any of the tissues. Conclusions This study suggests that chronic bacterial infection is not a major component of nasal polyp etiology.

Download Full-text

The Rich World of p53 DNA Binding Targets: The Role of DNA Structure

International Journal of Molecular Sciences ◽

10.3390/ijms20225605 ◽

2019 ◽

Vol 20 (22) ◽

pp. 5605 ◽

Cited By ~ 10

Author(s):

Václav Brázda ◽

Miroslav Fojta

Keyword(s):

Dna Binding ◽

Target Genes ◽

Consensus Sequence ◽

Holliday Junctions ◽

Quadruplex Dna ◽

Consensus Sequences ◽

P53 Response ◽

Functional Consequences ◽

The Rich ◽

Regulatory Functions

The tumor suppressor functions of p53 and its roles in regulating the cell cycle, apoptosis, senescence, and metabolism are accomplished mainly by its interactions with DNA. p53 works as a transcription factor for a significant number of genes. Most p53 target genes contain so-called p53 response elements in their promoters, consisting of 20 bp long canonical consensus sequences. Compared to other transcription factors, which usually bind to one concrete and clearly defined DNA target, the p53 consensus sequence is not strict, but contains two repeats of a 5′RRRCWWGYYY3′ sequence; therefore it varies remarkably among target genes. Moreover, p53 binds also to DNA fragments that at least partially and often completely lack this consensus sequence. p53 also binds with high affinity to a variety of non-B DNA structures including Holliday junctions, cruciform structures, quadruplex DNA, triplex DNA, DNA loops, bulged DNA, and hemicatenane DNA. In this review, we summarize information of the interactions of p53 with various DNA targets and discuss the functional consequences of the rich world of p53 DNA binding targets for its complex regulatory functions.

Download Full-text