HySA: A Hybrid Structural variant Assembly approach using next generation and single-molecule sequencing technologies

Mapping Intimacies ◽

10.1101/069815 ◽

2016 ◽

Cited By ~ 2

Author(s):

Xian Fan ◽

Mark Chaisson ◽

Luay Nakhleh ◽

Ken Chen

Keyword(s):

Human Genome ◽

Single Molecule ◽

Clustering Algorithm ◽

Hydatidiform Mole ◽

Cost Effective ◽

Next Generation ◽

Structural Variations ◽

Single Molecule Sequencing ◽

Structural Variant ◽

Sequencing Technologies

AbstractAchieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies.We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.

Download Full-text

HySA: a Hybrid Structural variant Assembly approach using next-generation and single-molecule sequencing technologies

Genome Research ◽

10.1101/gr.214767.116 ◽

2017 ◽

Vol 27 (5) ◽

pp. 793-800 ◽

Cited By ~ 20

Author(s):

Xian Fan ◽

Mark Chaisson ◽

Luay Nakhleh ◽

Ken Chen

Keyword(s):

Single Molecule ◽

Next Generation ◽

Single Molecule Sequencing ◽

Structural Variant ◽

Sequencing Technologies

Download Full-text

Direct-to-Consumer Genetic Testing

Genomics and Bioethics ◽

10.4018/978-1-61692-883-4.ch005 ◽

2011 ◽

pp. 51-84 ◽

Cited By ~ 1

Author(s):

Richard A. Stein

Keyword(s):

Human Genome ◽

Human Genome Project ◽

Cost Effective ◽

Helical Structure ◽

Genome Project ◽

Next Generation ◽

Double Helical Structure ◽

Sequencing Technologies ◽

Human Genome Sequencing ◽

The Human Genome Project

The 1953 discovery of the DNA double-helical structure by James Watson, Francis Crick, Maurice Wilkins, and Rosalind Franklin, represented one of the most significant advances in the biomedical world (Watson and Crick 1953; Maddox 2003). Almost half a century after this landmark event, in February 2001, the initial draft sequences of the human genome were published (Lander et al., 2001; Venter et al., 2001) and, in April 2003, the International Human Genome Sequencing Consortium reported the completion of the Human Genome Project, a massive international collaborative endeavor that started in 1990 and is thought to represent the most ambitious undertaking in the history of biology (Collins et al., 2003; Thangadurai, 2004; National Human Genome Research Institute). The Human Genome Project provided a plethora of genetic and genomic information that significantly changed our perspectives on biomedical and social sciences. The sequencing of the first human genome was a 13-year, 2.7-billion-dollar effort that relied on the automated Sanger (dideoxy or chain termination) method, which was developed in 1977, around the same time as the Maxam-Gilbert (chemical) sequencing, and subsequently became the most frequently used approach for several decades (Sanger et al., 1975; Maxam & Gilbert, 1977; Sanger et al., 1977). The new generations of DNA sequencing technologies, known as next-generation (second generation) and next-next-generation (third generation) sequencing, which started to be commercialized in 2005, enabled the cost-effective sequencing of large chromosomal regions during progressively shorter time frames, and opened the possibility for new applications, such as the sequencing of single-cell genomes (Service, 2006; Blow, 2008; Morozova and Marra, 2008; Metzker, 2010).

Download Full-text

Faculty Opinions recommendation of Single-molecule sequencing of an individual human genome.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1164671.626475 ◽

2009 ◽

Author(s):

Charles Auffray

Keyword(s):

Human Genome ◽

Single Molecule ◽

Single Molecule Sequencing ◽

Individual Human

Download Full-text

Genome assembly of the maize inbred line A188 provides a new reference genome for functional genomics

10.1101/2021.03.15.435372 ◽

2021 ◽

Author(s):

Fei Ge ◽

Jingtao Qu ◽

Peng Liu ◽

Lang Pan ◽

Chaoying Zou ◽

...

Keyword(s):

Single Molecule ◽

Inbred Line ◽

Genome Mapping ◽

Maize Inbred Line ◽

Sequencing Data ◽

Structural Variations ◽

Single Molecule Sequencing ◽

Maize Genetic ◽

Induction Ratio ◽

Phenotypic Variations

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.

Download Full-text

2. How to read the book of life

Genomics: A Very Short Introduction ◽

10.1093/actrade/9780198786207.003.0002 ◽

2018 ◽

pp. 12-27

Author(s):

John Archibald

Keyword(s):

Dna Sequencing ◽

Human Genome ◽

Single Molecule ◽

Human Genome Project ◽

Genome Project ◽

Chain Termination ◽

Single Molecule Sequencing ◽

Semiconductor Sequencing ◽

The Cost ◽

The Human Genome Project

For all its biological importance, DNA is a fragile molecule so extracting it is a difficult process. ‘How to read the book of life’ explains the techniques required to sequence DNA. It begins by explaining the techniques developed for protein and RNA sequencing by Frederick Sanger, Robert Holley, and Carl Woese that were then developed further for DNA sequencing. Following the success of the Human Genome Project, the next generation of DNA sequencing was developed in the mid-2000s. Pyrosequencing was capable of generating orders of magnitude more data at a fraction of the cost, but was superceded within a decade by semiconductor sequencing, reversible chain-termination sequencing, and single-molecule sequencing.

Download Full-text

Next-Generation Sequencing Technologies in Blood Group Typing

Transfusion Medicine and Hemotherapy ◽

10.1159/000504765 ◽

2019 ◽

Vol 47 (1) ◽

pp. 4-13 ◽

Cited By ~ 1

Author(s):

Daniel Fürst ◽

Chrysanthi Tsamadou ◽

Christine Neuchel ◽

Hubert Schrezenmeier ◽

Joannis Mytilineos ◽

...

Keyword(s):

Next Generation Sequencing ◽

Blood Group ◽

Large Scale ◽

Cost Effective ◽

Molecular Testing ◽

Blood Group Antigens ◽

Next Generation ◽

Sequencing Technologies ◽

Blood Group Typing ◽

Generation Sequencing

Sequencing of the human genome has led to the definition of the genes for most of the relevant blood group systems, and the polymorphisms responsible for most of the clinically relevant blood group antigens are characterized. Molecular blood group typing is used in situations where erythrocytes are not available or where serological testing was inconclusive or not possible due to the lack of antisera. Also, molecular testing may be more cost-effective in certain situations. Molecular typing approaches are mostly based on either PCR with specific primers, DNA hybridization, or DNA sequencing. Particularly the transition of sequencing techniques from Sanger-based sequencing to next-generation sequencing (NGS) technologies has led to exciting new possibilities in blood group genotyping. We describe briefly the currently available NGS platforms and their specifications, depict the genetic background of blood group polymorphisms, and discuss applications for NGS approaches in immunohematology. As an example, we delineate a protocol for large-scale donor blood group screening established and in use at our institution. Furthermore, we discuss technical challenges and limitations as well as the prospect for future developments, including long-read sequencing technologies.

Download Full-text

A new full-length virus genome sequencing method reveals that antiviral RNAi changes geminivirus populations in field-grown cassava

10.1101/168724 ◽

2017 ◽

Cited By ~ 1

Author(s):

Devang Mehta ◽

Matthias Hirsch-Hoffmann ◽

Mariam Were ◽

Andrea Patrignani ◽

Hassan Were ◽

...

Keyword(s):

Single Molecule ◽

Deep Sequencing ◽

Cost Effective ◽

Virus Genome ◽

Full Length ◽

Dna Viruses ◽

Circular Dna ◽

Sequencing Technologies ◽

Virus Genomes ◽

And Control

ABSTRACTDeep-sequencing of virus isolates using short-read sequencing technologies is problematic since viruses are often present in complexes sharing a high-degree of sequence identity. The full-length genomes of such highly-similar viruses cannot be assembled accurately from short sequencing reads. We present a new method, CIDER-Seq (Circular DNA Enrichment Sequencing) which successfully generates accurate full-length virus genomes from individual sequencing reads with no sequence assembly required. CIDER-Seq operates by combining a PCR-free, circular DNA enrichment protocol with Single Molecule Real Time sequencing and a new sequence deconcatenation algorithm. We apply our technique to produce more than 1,200 full-length, highly accurate geminivirus genomes from RNAi-transgenic and control plants in a field trial in Kenya. Using CIDER-Seq we can demonstrate for the first time that the expression of antiviral doublestranded RNA (dsRNA) in transgenic plants causes a consistent shift in virus populations towards species sharing low homology to the transgene derived dsRNA. Our results show that CIDER-seq is a powerful, cost-effective tool for accurately sequencing circular DNA viruses, with future applications in deep-sequencing other forms of circular DNA such as transposons and plasmids.

Download Full-text

The Evolution of High-Throughput Sequencing Technologies: From Sanger to Single-Molecule Sequencing

Next Generation Sequencing in Cancer Research ◽

10.1007/978-1-4614-7645-0_1 ◽

2013 ◽

pp. 1-30

Author(s):

Chee-Seng Ku ◽

Yudi Pawitan ◽

Mengchu Wu ◽

Dimitrios H. Roukos ◽

David N. Cooper

Keyword(s):

High Throughput ◽

Single Molecule ◽

High Throughput Sequencing ◽

Single Molecule Sequencing ◽

Sequencing Technologies

Download Full-text

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

10.1101/635037 ◽

2019 ◽

Cited By ~ 7

Author(s):

Mitchell R. Vollger ◽

Glennis A. Logsdon ◽

Peter A. Audano ◽

Arvis Sulovari ◽

David Porubsky ◽

...

Keyword(s):

Human Genome ◽

Single Molecule ◽

Tandem Repeats ◽

De Novo ◽

Sequence Data ◽

Gene Annotation ◽

Hydatidiform Mole ◽

High Fidelity ◽

Human Genomes ◽

Long Read

AbstractThe sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.

Download Full-text

Critical assessment of bioinformatics methods for the characterization of pathological repeat expansions with single-molecule sequencing data

Briefings in Bioinformatics ◽

10.1093/bib/bbz099 ◽

2019 ◽

Vol 21 (6) ◽

pp. 1971-1986 ◽

Cited By ~ 1

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Ernesto Picardi ◽

David S Horner ◽

Graziano Pesole

Keyword(s):

Single Molecule ◽

Tandem Repeats ◽

Simulated Data ◽

Detailed Comparison ◽

Sequencing Data ◽

Single Molecule Sequencing ◽

Sequencing Technologies ◽

Repeat Expansions

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.

Download Full-text