scholarly journals HySA: A Hybrid Structural variant Assembly approach using next generation and single-molecule sequencing technologies

2016 ◽  
Author(s):  
Xian Fan ◽  
Mark Chaisson ◽  
Luay Nakhleh ◽  
Ken Chen

AbstractAchieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies.We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome.In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.

2011 ◽  
pp. 51-84 ◽  
Author(s):  
Richard A. Stein

The 1953 discovery of the DNA double-helical structure by James Watson, Francis Crick, Maurice Wilkins, and Rosalind Franklin, represented one of the most significant advances in the biomedical world (Watson and Crick 1953; Maddox 2003). Almost half a century after this landmark event, in February 2001, the initial draft sequences of the human genome were published (Lander et al., 2001; Venter et al., 2001) and, in April 2003, the International Human Genome Sequencing Consortium reported the completion of the Human Genome Project, a massive international collaborative endeavor that started in 1990 and is thought to represent the most ambitious undertaking in the history of biology (Collins et al., 2003; Thangadurai, 2004; National Human Genome Research Institute). The Human Genome Project provided a plethora of genetic and genomic information that significantly changed our perspectives on biomedical and social sciences. The sequencing of the first human genome was a 13-year, 2.7-billion-dollar effort that relied on the automated Sanger (dideoxy or chain termination) method, which was developed in 1977, around the same time as the Maxam-Gilbert (chemical) sequencing, and subsequently became the most frequently used approach for several decades (Sanger et al., 1975; Maxam & Gilbert, 1977; Sanger et al., 1977). The new generations of DNA sequencing technologies, known as next-generation (second generation) and next-next-generation (third generation) sequencing, which started to be commercialized in 2005, enabled the cost-effective sequencing of large chromosomal regions during progressively shorter time frames, and opened the possibility for new applications, such as the sequencing of single-cell genomes (Service, 2006; Blow, 2008; Morozova and Marra, 2008; Metzker, 2010).


2021 ◽  
Author(s):  
Fei Ge ◽  
Jingtao Qu ◽  
Peng Liu ◽  
Lang Pan ◽  
Chaoying Zou ◽  
...  

Heretofore, little is known about the mechanism underlying the genotype-dependence of embryonic callus (EC) induction, which has severely inhibited the development of maize genetic engineering. Here, we report the genome sequence and annotation of a maize inbred line with high EC induction ratio, A188, which is assembled from single-molecule sequencing and optical genome mapping. We assembled a 2,210 Mb genome with a scaffold N50 size of 11.61 million bases (Mb), compared to those of 9.73 Mb for B73 and 10.2 Mb for Mo17. Comparative analysis revealed that ~30% of the predicted A188 genes had large structural variations to B73, Mo17 and W22 genomes, which caused considerable protein divergence and might lead to phenotypic variations between the four inbred lines. Combining our new A188 genome, previously reported QTLs and RNA sequencing data, we reveal 8 large structural variation genes and 4 differentially expressed genes playing potential roles in EC induction.


Author(s):  
John Archibald

For all its biological importance, DNA is a fragile molecule so extracting it is a difficult process. ‘How to read the book of life’ explains the techniques required to sequence DNA. It begins by explaining the techniques developed for protein and RNA sequencing by Frederick Sanger, Robert Holley, and Carl Woese that were then developed further for DNA sequencing. Following the success of the Human Genome Project, the next generation of DNA sequencing was developed in the mid-2000s. Pyrosequencing was capable of generating orders of magnitude more data at a fraction of the cost, but was superceded within a decade by semiconductor sequencing, reversible chain-termination sequencing, and single-molecule sequencing.


2019 ◽  
Vol 47 (1) ◽  
pp. 4-13 ◽  
Author(s):  
Daniel Fürst ◽  
Chrysanthi Tsamadou ◽  
Christine Neuchel ◽  
Hubert Schrezenmeier ◽  
Joannis Mytilineos ◽  
...  

Sequencing of the human genome has led to the definition of the genes for most of the relevant blood group systems, and the polymorphisms responsible for most of the clinically relevant blood group antigens are characterized. Molecular blood group typing is used in situations where erythrocytes are not available or where serological testing was inconclusive or not possible due to the lack of antisera. Also, molecular testing may be more cost-effective in certain situations. Molecular typing approaches are mostly based on either PCR with specific primers, DNA hybridization, or DNA sequencing. Particularly the transition of sequencing techniques from Sanger-based sequencing to next-generation sequencing (NGS) technologies has led to exciting new possibilities in blood group genotyping. We describe briefly the currently available NGS platforms and their specifications, depict the genetic background of blood group polymorphisms, and discuss applications for NGS approaches in immunohematology. As an example, we delineate a protocol for large-scale donor blood group screening established and in use at our institution. Furthermore, we discuss technical challenges and limitations as well as the prospect for future developments, including long-read sequencing technologies.


2017 ◽  
Author(s):  
Devang Mehta ◽  
Matthias Hirsch-Hoffmann ◽  
Mariam Were ◽  
Andrea Patrignani ◽  
Hassan Were ◽  
...  

ABSTRACTDeep-sequencing of virus isolates using short-read sequencing technologies is problematic since viruses are often present in complexes sharing a high-degree of sequence identity. The full-length genomes of such highly-similar viruses cannot be assembled accurately from short sequencing reads. We present a new method, CIDER-Seq (Circular DNA Enrichment Sequencing) which successfully generates accurate full-length virus genomes from individual sequencing reads with no sequence assembly required. CIDER-Seq operates by combining a PCR-free, circular DNA enrichment protocol with Single Molecule Real Time sequencing and a new sequence deconcatenation algorithm. We apply our technique to produce more than 1,200 full-length, highly accurate geminivirus genomes from RNAi-transgenic and control plants in a field trial in Kenya. Using CIDER-Seq we can demonstrate for the first time that the expression of antiviral doublestranded RNA (dsRNA) in transgenic plants causes a consistent shift in virus populations towards species sharing low homology to the transgene derived dsRNA. Our results show that CIDER-seq is a powerful, cost-effective tool for accurately sequencing circular DNA viruses, with future applications in deep-sequencing other forms of circular DNA such as transposons and plasmids.


2019 ◽  
Author(s):  
Mitchell R. Vollger ◽  
Glennis A. Logsdon ◽  
Peter A. Audano ◽  
Arvis Sulovari ◽  
David Porubsky ◽  
...  

AbstractThe sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.


2019 ◽  
Vol 21 (6) ◽  
pp. 1971-1986 ◽  
Author(s):  
Matteo Chiara ◽  
Federico Zambelli ◽  
Ernesto Picardi ◽  
David S Horner ◽  
Graziano Pesole

Abstract A number of studies have reported the successful application of single-molecule sequencing technologies to the determination of the size and sequence of pathological expanded microsatellite repeats over the last 5 years. However, different custom bioinformatics pipelines were employed in each study, preventing meaningful comparisons and somewhat limiting the reproducibility of the results. In this review, we provide a brief summary of state-of-the-art methods for the characterization of expanded repeats alleles, along with a detailed comparison of bioinformatics tools for the determination of repeat length and sequence, using both real and simulated data. Our reanalysis of publicly available human genome sequencing data suggests a modest, but statistically significant, increase of the error rate of single-molecule sequencing technologies at genomic regions containing short tandem repeats. However, we observe that all the methods herein tested, irrespective of the strategy used for the analysis of the data (either based on the alignment or assembly of the reads), show high levels of sensitivity in both the detection of expanded tandem repeats and the estimation of the expansion size, suggesting that approaches based on single-molecule sequencing technologies are highly effective for the detection and quantification of tandem repeat expansions and contractions.


Sign in / Sign up

Export Citation Format

Share Document