circular consensus sequencing
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 1)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Gunjan Baid ◽  
Daniel E Cook ◽  
Kishwar Shafin ◽  
Taedong Yun ◽  
Felipe Llinares-Lopez ◽  
...  

Pacific BioScience (PacBio) circular consensus sequencing (CCS) generates long (10-25 kb), accurate "HiFi" reads by combining serial observations of a DNA molecule into a consensus sequence. The standard approach to consensus generation uses a hidden Markov model (pbccs). Here, we introduce DeepConsensus, which uses a unique alignment-based loss to train a gap-aware transformer-encoder (GATE) for sequence correction. Compared to pbccs, DeepConsensus reduces read errors in the same dataset by 42%. This increases the yield of PacBio HiFi reads at Q20 by 9%, at Q30 by 27%, and at Q40 by 90%. With two SMRT Cells of HG003, reads from DeepConsensus improve hifiasm assembly contiguity (NG50 4.9Mb to 17.2Mb), increase gene completeness (94% to 97%), reduce false gene duplication rate (1.1% to 0.5%), improve assembly base accuracy (Q43 to Q45), and also reduce variant calling errors by 24%.


2020 ◽  
Vol 52 (10) ◽  
pp. 1018-1023 ◽  
Author(s):  
Qian Zhou ◽  
Dié Tang ◽  
Wu Huang ◽  
Zhongmin Yang ◽  
Yu Zhang ◽  
...  

Abstract Potato (Solanum tuberosum L.) is the most important tuber crop worldwide. Efforts are underway to transform the crop from a clonally propagated tetraploid into a seed-propagated, inbred-line-based hybrid, but this process requires a better understanding of potato genome. Here, we report the 1.67-Gb haplotype-resolved assembly of a diploid potato, RH89-039-16, using a combination of multiple sequencing strategies, including circular consensus sequencing. Comparison of the two haplotypes revealed ~2.1% intragenomic diversity, including 22,134 predicted deleterious mutations in 10,642 annotated genes. In 20,583 pairs of allelic genes, 16.6% and 30.8% exhibited differential expression and methylation between alleles, respectively. Deleterious mutations and differentially expressed alleles were dispersed throughout both haplotypes, complicating strategies to eradicate deleterious alleles or stack beneficial alleles via meiotic recombination. This study offers a holistic view of the genome organization of a clonally propagated diploid species and provides insights into technological evolution in resolving complex genomes.


2020 ◽  
Author(s):  
Xiao Du ◽  
Lili Li ◽  
Fan Liang ◽  
Sanyang Liu ◽  
Wenxin Zhang ◽  
...  

AbstractThe importance of structural variants (SVs) on phenotypes and human diseases is now recognized. Although a variety of SV detection platforms and strategies that vary in sensitivity and specificity have been developed, few benchmarking procedures are available to confidently assess their performances in biological and clinical research. To facilitate the validation and application of those approaches, our work established an Asian reference material comprising identified benchmark regions and high-confidence SV calls. We established a high-confidence SV callset with 8,938 SVs in an EBV immortalized B lymphocyte line, by integrating four alignment-based SV callers [from 109× PacBio continuous long read (CLR), 22× PacBio circular consensus sequencing (CCS) reads, 104× Oxford Nanopore long reads, and 114× optical mapping platform (Bionano)] and one de novo assembly-based SV caller using CCS reads. A total of 544 randomly selected SVs were validated by PCR and Sanger sequencing, proofing the robustness of our SV calls. Combining trio-binning based haplotype assemblies, we established an SV benchmark for identification of false negatives and false positives by constructing the continuous high confident regions (CHCRs), which cover 1.46Gb and 6,882 SVs supported by at least one diploid haplotype assembly. Establishing high-confidence SV calls for a benchmark sample that has been characterized by multiple technologies provides a valuable resource for investigating SVs in human biology, disease, and clinical diagnosis.


2020 ◽  
Author(s):  
Susan M. Hiatt ◽  
James M.J. Lawlor ◽  
Lori H. Handley ◽  
Ryne C. Ramaker ◽  
Brianne B. Rogers ◽  
...  

AbstractPurposeExome and genome sequencing have proven to be effective tools for the diagnosis of neurodevelopmental disorders (NDDs), but large fractions of NDDs cannot be attributed to currently detectable genetic variation. This is likely, at least in part, a result of the fact that many genetic variants are difficult or impossible to detect through typical short-read sequencing approaches.MethodsHere, we describe a genomic analysis using Pacific Biosciences circular consensus sequencing (CCS) reads, which are both long (>10 kb) and accurate (>99% bp accuracy). We used CCS on six proband-parent trios with NDDs that were unexplained despite extensive testing, including genome sequencing with short reads.ResultsWe identified variants and created de novo assemblies in each trio, with global metrics indicating these data sets are more accurate and comprehensive than those provided by short-read data. In one proband, we identified a likely pathogenic (LP), de novo L1-mediated insertion in CDKL5 that results in duplication of exon 3, leading to a frameshift. In a second proband, we identified multiple large de novo structural variants, including insertion-translocations affecting DGKB and MLLT3, which we show disrupt MLLT3 transcript levels. We consider this extensive structural variation likely pathogenic.ConclusionThe breadth and quality of variant detection, coupled to finding variants of clinical and research interest in two of six probands with unexplained NDDs strongly support the value of long-read genome sequencing for understanding rare disease.


2019 ◽  
Author(s):  
Aaron M. Wenger ◽  
Paul Peluso ◽  
William J. Rowell ◽  
Pi-Chuan Chang ◽  
Richard J. Hall ◽  
...  

AbstractThe major DNA sequencing technologies in use today produce either highly-accurate short reads or noisy long reads. We developed a protocol based on single-molecule, circular consensus sequencing (CCS) to generate highly-accurate (99.8%) long reads averaging 13.5 kb and applied it to sequence the well-characterized human HG002/NA24385. We optimized existing tools to comprehensively detect variants, achieving precision and recall above 99.91% for SNVs, 95.98% for indels, and 95.99% for structural variants. We estimate that 2,434 discordances are correctable mistakes in the high-quality Genome in a Bottle benchmark. Nearly all (99.64%) variants are phased into haplotypes, which further improves variant detection. De novo assembly produces a highly contiguous and accurate genome with contig N50 above 15 Mb and concordance of 99.998%. CCS reads match short reads for small variant detection, while enabling structural variant detection and de novo assembly at similar contiguity and markedly higher concordance than noisy long reads.


2018 ◽  
Author(s):  
Satoshi Hiraoka ◽  
Yusuke Okazaki ◽  
Mizue Anda ◽  
Atsushi Toyoda ◽  
Shin-ichi Nakano ◽  
...  

AbstractDNA methylation plays important roles in prokaryotes, such as in defense mechanisms against phage infection, and the corresponding genomic landscapes—prokaryotic epigenomes—have recently begun to be disclosed. However, our knowledge of prokaryote methylation systems has been severely limited to those of culturable prokaryotes, whereas environmental communities are in fact dominated by uncultured members that must harbor much more diverse DNA methyltransferases. Here, using single-molecule real-time and circular consensus sequencing techniques, we revealed the ‘metaepigenomes’ of an environmental prokaryotic community in the largest lake in Japan, Lake Biwa. A total of 19 draft genomes from phylogenetically diverse groups, most of which are yet to be cultured, were successfully reconstructed. The analysis of DNA chemical modifications identified 29 methylated motifs in those genomes, among which 14 motifs were novel.Furthermore, we searched for the methyltransferase genes responsible for the methylation of the detected novel motifs and confirmed their catalytic specificities via transformation experiments involving artificially synthesized genes. Finally, we found that genomes without DNA methylation tended to exhibit higher phage infection levels than those with methylation. In summary, this study proves that metaepigenomics is a powerful approach for revealing the vast unexplored variety of prokaryotic DNA methylation systems in nature.


2016 ◽  
Vol 91 (6) ◽  
Author(s):  
Cynthia K. Y. Ho ◽  
Jayna Raghwani ◽  
Sylvie Koekkoek ◽  
Richard H. Liang ◽  
Jan T. M. Van der Meer ◽  
...  

ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations.


2015 ◽  
Vol 76 (12) ◽  
pp. 891-896 ◽  
Author(s):  
Catherine J. Westbrook ◽  
Julie A. Karl ◽  
Roger W. Wiseman ◽  
Suzanne Mate ◽  
Galina Koroleva ◽  
...  

2014 ◽  
Vol 204 (4) ◽  
pp. 1041-1049 ◽  
Author(s):  
Qiushi Li ◽  
Ying Li ◽  
Jingyuan Song ◽  
Haibin Xu ◽  
Jiang Xu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document