scholarly journals Control of artefactual variation in reported inter-sample relatedness during clinical use of a Mycobacterium tuberculosis sequencing pipeline

2018 ◽  
Author(s):  
David H Wyllie ◽  
Nicholas Sanderson ◽  
Richard Myers ◽  
Tim Peto ◽  
Esther Robinson ◽  
...  

ABSTRACTContact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artefactual variation between M. tuberculosis isolates during routine next generation sequencing of Mycobacterium spp, we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted, sequenced, reads mapped, and consensus sequences determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of non-Mycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of non-Mycobacterial bacterial DNA, we found significant increases in minor variant frequencies of more than 1.5 fold in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high variation regions strongly influenced by the amount of non-Mycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we have demonstrated an approach identifying critical genomic regions contributing to clinically relevant artefactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multi-step laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics.

2018 ◽  
Vol 56 (8) ◽  
Author(s):  
David H. Wyllie ◽  
Nicholas Sanderson ◽  
Richard Myers ◽  
Tim Peto ◽  
Esther Robinson ◽  
...  

ABSTRACT Contact tracing requires reliable identification of closely related bacterial isolates. When we noticed the reporting of artifactual variation between Mycobacterium tuberculosis isolates during routine next-generation sequencing of Mycobacterium spp., we investigated its basis in 2,018 consecutive M. tuberculosis isolates. In the routine process used, clinical samples were decontaminated and inoculated into broth cultures; from positive broth cultures DNA was extracted and sequenced, reads were mapped, and consensus sequences were determined. We investigated the process of consensus sequence determination, which selects the most common nucleotide at each position. Having determined the high-quality read depth and depth of minor variants across 8,006 M. tuberculosis genomic regions, we quantified the relationship between the minor variant depth and the amount of nonmycobacterial bacterial DNA, which originates from commensal microbes killed during sample decontamination. In the presence of nonmycobacterial bacterial DNA, we found significant increases in minor variant frequencies, of more than 1.5-fold, in 242 regions covering 5.1% of the M. tuberculosis genome. Included within these were four high-variation regions strongly influenced by the amount of nonmycobacterial bacterial DNA. Excluding these four regions from pairwise distance comparisons reduced biologically implausible variation from 5.2% to 0% in an independent validation set derived from 226 individuals. Thus, we demonstrated an approach identifying critical genomic regions contributing to clinically relevant artifactual variation in bacterial similarity searches. The approach described monitors the outputs of the complex multistep laboratory and bioinformatics process, allows periodic process adjustments, and will have application to quality control of routine bacterial genomics.


Author(s):  
Matthew L Bendall ◽  
Keylie M Gibson ◽  
Margaret C Steiner ◽  
Uzma Rentia ◽  
Marcos Pérez-Losada ◽  
...  

Abstract Deep sequencing of viral populations using next generation sequencing (NGS) offers opportunities to understand and investigate evolution, transmission dynamics, and population genetics. Currently, the standard practice for processing NGS data to study viral populations is to summarize all the observed sequences from a sample as a single consensus sequence, thus discarding valuable information about the intra-host viral molecular epidemiology. Furthermore, existing analytical pipelines may only analyze genomic regions involved in drug resistance, thus are not suited for full viral genome analysis. Here we present HAPHPIPE, a HAplotype and PHylodynamics PIPEline for genome-wide assembly of viral consensus sequences and haplotypes. The HAPHPIPE protocol includes modules for quality trimming, error correction, de novo assembly, alignment, and haplotype reconstruction. The resulting consensus sequences, haplotypes, and alignments can be further analyzed using a variety of phylogenetic and population genetic software. HAPHPIPE is designed to provide users with a single pipeline to rapidly analyze sequences from viral populations generated from NGS platforms and provide quality output properly formatted for downstream evolutionary analyses.


2020 ◽  
Vol 21 (3) ◽  
pp. 944 ◽  
Author(s):  
Valery V. Panyukov ◽  
Sergey S. Kiselev ◽  
Olga N. Ozoline

The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn’s disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific “barcodes” for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.


2006 ◽  
Vol 291 (1) ◽  
pp. G26-G34 ◽  
Author(s):  
Hideki Nakatsuka ◽  
Takaaki Sokabe ◽  
Kimiko Yamamoto ◽  
Yoshinobu Sato ◽  
Katsuyoshi Hatakeyama ◽  
...  

Partial hepatectomy causes hemodynamic changes that increase portal blood flow in the remaining lobe, where the expression of immediate-early genes, including plasminogen activator inhibitor-1 (PAI-1), is induced. We hypothesized that a hyperdynamic circulatory state occurring in the remaining lobe induces immediate-early gene expression. In this study, we investigated whether the mechanical force generated by flowing blood, shear stress, induces PAI-1 expression in hepatocytes. When cultured rat hepatocytes were exposed to flow, PAI-1 mRNA levels began to increase within 3 h, peaked at levels significantly higher than the static control levels, and then gradually decreased. The flow-induced PAI-1 expression was shear stress dependent rather than shear rate dependent and accompanied by increased hepatocyte production of PAI-1 protein. Shear stress increased PAI-1 transcription but did not affect PAI-1 mRNA stability. Functional analysis of the 2.1-kb PAI-1 5′-promoter indicated that a 278-bp segment containing transcription factor Sp1 and Ets-1 consensus sequences was critical to the shear stress-dependent increase of PAI-1 transcription. Mutations of both the Sp1 and Ets-1 consensus sequences, but not of either one alone, markedly prevented basal PAI-1 transcription and abolished the response of the PAI-1 promoter to shear stress. EMSA and chromatin immunoprecipitation assays showed binding of Sp1 and Ets-1 to each consensus sequence under static conditions, which increased in response to shear stress. In conclusion, hepatocyte PAI-1 expression is flow sensitive and transcriptionally regulated by shear stress via cooperative interactions between Sp1 and Ets-1.


2019 ◽  
Author(s):  
Yiting Zhou ◽  
Guangwei Ma ◽  
Jiawen Yang ◽  
Yabin Guo

Abstract Background: Sleeping Beauty (SB) transposon had been thought to strictly integrate into TA dinucleotides. Recently, we found that SB also integrates into non-TA sites at a lower frequency. Here we performed further study on the non-TA integration of SB. Results: 1) SB can integrate into non-TA sites in HEK293T cells as well as in mouse cell lines. 2) Both the hyperactive transposase SB100X and the traditional SB11 catalyze integrations at non-TA sites. 3) The consensus sequence of the non-TA target sites only occur at the opposite side of the sequenced junction between transposon end and the genomic sequences, indicating that the integrations at non-TA sites are mainly aberrant integrations. 4) The consensus sequence of the non-TA target sites is corresponding to the transposon end sequence. When the transposon end sequence is mutated, the consensus sequences changed too. Conclusion: The interaction between the SB transposon end and genomic DNA may be involved in the target site selection of the SB integrations at non-TA sites.


2015 ◽  
Vol 6 (1) ◽  
Author(s):  
Phelim Bradley ◽  
N. Claire Gordon ◽  
Timothy M. Walker ◽  
Laura Dunn ◽  
Simon Heys ◽  
...  

Abstract The rise of antibiotic-resistant bacteria has led to an urgent need for rapid detection of drug resistance in clinical samples, and improvements in global surveillance. Here we show how de Bruijn graph representation of bacterial diversity can be used to identify species and resistance profiles of clinical isolates. We implement this method for Staphylococcus aureus and Mycobacterium tuberculosis in a software package (‘Mykrobe predictor’) that takes raw sequence data as input, and generates a clinician-friendly report within 3 minutes on a laptop. For S. aureus, the error rates of our method are comparable to gold-standard phenotypic methods, with sensitivity/specificity of 99.1%/99.6% across 12 antibiotics (using an independent validation set, n=470). For M. tuberculosis, our method predicts resistance with sensitivity/specificity of 82.6%/98.5% (independent validation set, n=1,609); sensitivity is lower here, probably because of limited understanding of the underlying genetic mechanisms. We give evidence that minor alleles improve detection of extremely drug-resistant strains, and demonstrate feasibility of the use of emerging single-molecule nanopore sequencing techniques for these purposes.


1990 ◽  
Vol 10 (9) ◽  
pp. 4456-4465
Author(s):  
S M Carroll ◽  
P Narayan ◽  
F M Rottman

N6-methyladenosine (m6A) residues occur at internal positions in most cellular and viral RNAs; both heterogeneous nuclear RNA and mRNA are involved. This modification arises by enzymatic transfer of a methyl group from S-adenosylmethionine to the central adenosine residue in the canonical sequence G/AAC. Thus far, m6A has been mapped to specific locations in eucaryotic mRNA and viral genomic RNA. We have now examined an intron-specific sequence of a modified bovine prolactin precursor RNA for the presence of this methylated nucleotide by using both transfected-cell systems and a cell-free system capable of methylating mRNA transcripts in vitro. The results indicate the final intron-specific sequence (intron D) of a prolactin RNA molecule does indeed possess m6A residues. When mapped to specific T1 oligonucleotides, the predominant site of methylation was found to be within the consensus sequence AGm6ACU. The level of m6A at this site is nonstoichiometric; approximately 24% of the molecules are modified in vivo. Methylation was detected at markedly reduced levels at other consensus sites within the intron but not in T1 oligonucleotides which do not contain either AAC or GAC consensus sequences. In an attempt to correlate mRNA methylation with processing, stably transfected CHO cells expressing augmented levels of bovine prolactin were treated with neplanocin A, an inhibitor of methylation. Under these conditions, the relative steady-state levels of the intron-containing nuclear precursor increased four to six times that found in control cells.


Author(s):  
Ren-Xiang Yan ◽  
Jing Liu ◽  
Yi-Min Tao

Profile-profile alignment may be the most sensitive and useful computational resource for identifying remote homologies and recognizing protein folds. However, profile-profile alignment is usually much more complex and slower than sequence-sequence or profile-sequence alignment. The profile or PSSM (position-specific scoring matrix) can be used to represent the mutational variability at each sequence position of a protein by using a vector of amino acid substitution frequencies and it is a much richer encoding of a protein sequence. Consensus sequence, which can be considered as a simplified profile, was used to improve sequence alignment accuracy in the early time. Recently, several studies were carried out to improve PSI-BLAST’s fold recognition performance by using consensus sequence information. There are several ways to compute a consensus sequence. Based on these considerations, we propose a method that combines the information of different types of consensus sequences with the assistance of support vector machine learning in this chapter. Benchmark results suggest that our method can further improve PSI-BLAST’s fold recognition performance.


2002 ◽  
Vol 16 (3) ◽  
pp. 169-173 ◽  
Author(s):  
Gerald A Bucholtz ◽  
Sherry A. Salzman ◽  
Fernando B. Bersalona ◽  
Timothy R. Boyle ◽  
Victor S. Ejercito ◽  
...  

Background Nasal polyps are considered to result from chronic inflammation, but the initial or persisting stimulus for the inflammation is not known. A variety of bacteria and fungi have been cultured from nasal polyps, but ∼35% have sterile cultures. Previously, Mycoplasma pneumoniae–specific DNA was detected in human nasal polyps using polymerase chain reaction (PCR) techniques, suggesting M. pneumoniae as a causative agent in the etiology of nasal polyps. Methods In this study, we tested for the presence of bacterial DNA in nasal polyps resected from 40 patients, in nasal mucosa membrane from 9 patients undergoing turbinectomy for hypertrophy, and in sinus mucosa membrane from 6 patients undergoing endoscopic surgery for chronic sinusitis. Tissue DNA was extracted and analyzed by PCR using M. pneumoniae specific primers for DNA that encode the 16S rRNA gene in 41 specimens (31 polyps, 6 turbinates, and 4 sinus), and by consensus sequence-based PCR using broad range primers for most eubacterial DNA encoding the 16S rRNA gene in 38 specimens (26 polyps, 7 turbinates, and 5 sinuses). Results Only two samples were positive for bacterial DNA encoding 16S rRNA: Streptococcus sp. DNA was isolated from one polyp specimen and Pseudomonas aeruginosa DNA was isolated in one maxillary sinusitis specimen. No evidence of M. pneumoniae–specific DNA encoding 16S rRNA was found in any of the tissues. Conclusions This study suggests that chronic bacterial infection is not a major component of nasal polyp etiology.


2019 ◽  
Vol 20 (22) ◽  
pp. 5605 ◽  
Author(s):  
Václav Brázda ◽  
Miroslav Fojta

The tumor suppressor functions of p53 and its roles in regulating the cell cycle, apoptosis, senescence, and metabolism are accomplished mainly by its interactions with DNA. p53 works as a transcription factor for a significant number of genes. Most p53 target genes contain so-called p53 response elements in their promoters, consisting of 20 bp long canonical consensus sequences. Compared to other transcription factors, which usually bind to one concrete and clearly defined DNA target, the p53 consensus sequence is not strict, but contains two repeats of a 5′RRRCWWGYYY3′ sequence; therefore it varies remarkably among target genes. Moreover, p53 binds also to DNA fragments that at least partially and often completely lack this consensus sequence. p53 also binds with high affinity to a variety of non-B DNA structures including Holliday junctions, cruciform structures, quadruplex DNA, triplex DNA, DNA loops, bulged DNA, and hemicatenane DNA. In this review, we summarize information of the interactions of p53 with various DNA targets and discuss the functional consequences of the rich world of p53 DNA binding targets for its complex regulatory functions.


Sign in / Sign up

Export Citation Format

Share Document