Single haplotype assembly of the human genome from a hydatidiform mole

Mapping Intimacies ◽

10.1101/006841 ◽

2014 ◽

Author(s):

Karyn Meltz Steinberg ◽

Valerie K Schneider ◽

Tina A Graves-Lindsay ◽

Robert S Fulton ◽

Richa Agarwala ◽

...

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Hydatidiform Mole ◽

Repetitive Sequences ◽

Allelic Diversity ◽

Great Majority ◽

Sequence Assembly ◽

Sequence Coverage ◽

Bac Clone ◽

Assembly Error

An accurate and complete reference human genome sequence assembly is essential for accurately interpreting individual genomes and associating sequence variation with disease phenotypes. While the current reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can help overcome these problems, even the longest available reads do not resolve all regions of the human genome. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones, an optical map, and 100X whole genome shotgun (WGS) sequence coverage using short (Illumina) read pairs. We used the WGS sequence and the GRCh37 reference assembly to create a sequence assembly of the CHM1 genome. We subsequently incorporated 382 finished CHORI-17 BAC clone sequences to generate a second draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene and repeat content show this assembly to be of excellent quality and contiguity, and comparisons to ClinVar and the NHGRI GWAS catalog show that the CHM1 genome does not harbor an excess of deleterious alleles. However, comparison to assembly-independent resources, such as BAC clone end sequences and long reads generated by a different sequencing technology (PacBio), indicate misassembled regions. The great majority of these regions is enriched for structural variation and segmental duplication, and can be resolved in the future by sequencing BAC clone tiling paths. This publicly available first generation assembly will be integrated into the Genome Reference Consortium (GRC) curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.

Download Full-text

Faculty Opinions recommendation of A hybrid approach for de novo human genome sequence assembly and phasing.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726342609.793528105 ◽

2017 ◽

Author(s):

Stefano Lonardi

Keyword(s):

Human Genome ◽

Genome Sequence ◽

De Novo ◽

Hybrid Approach ◽

Sequence Assembly ◽

Human Genome Sequence ◽

Genome Sequence Assembly

Download Full-text

A hybrid approach for de novo human genome sequence assembly and phasing

Nature Methods ◽

10.1038/nmeth.3865 ◽

2016 ◽

Vol 13 (7) ◽

pp. 587-590 ◽

Cited By ~ 159

Author(s):

Yulia Mostovoy ◽

Michal Levy-Sakin ◽

Jessica Lam ◽

Ernest T Lam ◽

Alex R Hastie ◽

...

Keyword(s):

Human Genome ◽

Genome Sequence ◽

De Novo ◽

Hybrid Approach ◽

Sequence Assembly ◽

Human Genome Sequence ◽

Genome Sequence Assembly

Download Full-text

Draft human genome sequence published

Nature Precedings ◽

10.1038/news010215-2 ◽

2001 ◽

Author(s):

David Adam

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genome Sequence ◽

Draft Human Genome Sequence ◽

Draft Human Genome

Download Full-text

Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time

International Journal of Molecular Sciences ◽

10.3390/ijms22094707 ◽

2021 ◽

Vol 22 (9) ◽

pp. 4707

Author(s):

Mariana Lopes ◽

Sandra Louzada ◽

Margarida Gama-Carvalho ◽

Raquel Chaves

Keyword(s):

Human Genome ◽

Satellite Dna ◽

Repetitive Sequences ◽

Nucleotide Composition ◽

Genomic Component ◽

Genomic Studies ◽

Human Genomic ◽

Definition Of ◽

High Degree

(Peri)centromeric repetitive sequences and, more specifically, satellite DNA (satDNA) sequences, constitute a major human genomic component. SatDNA sequences can vary on a large number of features, including nucleotide composition, complexity, and abundance. Several satDNA families have been identified and characterized in the human genome through time, albeit at different speeds. Human satDNA families present a high degree of sub-variability, leading to the definition of various subfamilies with different organization and clustered localization. Evolution of satDNA analysis has enabled the progressive characterization of satDNA features. Despite recent advances in the sequencing of centromeric arrays, comprehensive genomic studies to assess their variability are still required to provide accurate and proportional representation of satDNA (peri)centromeric/acrocentric short arm sequences. Approaches combining multiple techniques have been successfully applied and seem to be the path to follow for generating integrated knowledge in the promising field of human satDNA biology.

Download Full-text

Genome-Wide Analysis of Terpene Synthase Gene Family in Mentha longifolia and Catalytic Activity Analysis of a Single Terpene Synthase

Genes ◽

10.3390/genes12040518 ◽

2021 ◽

Vol 12 (4) ◽

pp. 518

Author(s):

Zequn Chen ◽

Xiwu Qi ◽

Xu Yu ◽

Ying Zheng ◽

Zhiqi Liu ◽

...

Keyword(s):

Catalytic Activity ◽

Essential Oils ◽

Gene Family ◽

Genome Sequence ◽

Sequence Assembly ◽

Terpene Synthase ◽

Genome Sequence Assembly ◽

Mentha Longifolia ◽

Specific Expansion ◽

Species Specific

Terpenoids are a wide variety of natural products and terpene synthase (TPS) plays a key role in the biosynthesis of terpenoids. Mentha plants are rich in essential oils, whose main components are terpenoids, and their biosynthetic pathways have been basically elucidated. However, there is a lack of systematic identification and study of TPS in Mentha plants. In this work, we genome-widely identified and analyzed the TPS gene family in Mentha longifolia, a model plant for functional genomic research in the genus Mentha. A total of 63 TPS genes were identified in the M. longifolia genome sequence assembly, which could be divided into six subfamilies. The TPS-b subfamily had the largest number of genes, which might be related to the abundant monoterpenoids in Mentha plants. The TPS-e subfamily had 18 members and showed a significant species-specific expansion compared with other sequenced Lamiaceae plant species. The 63 TPS genes could be mapped to nine scaffolds of the M. longifolia genome sequence assembly and the distribution of these genes is uneven. Tandem duplicates and fragment duplicates contributed greatly to the increase in the number of TPS genes in M. longifolia. The conserved motifs (RR(X)8W, NSE/DTE, RXR, and DDXXD) were analyzed in M. longifolia TPSs, and significant differentiation was found between different subfamilies. Adaptive evolution analysis showed that M. longifolia TPSs were subjected to purifying selection after the species-specific expansion, and some amino acid residues under positive selection were identified. Furthermore, we also cloned and analyzed the catalytic activity of a single terpene synthase, MlongTPS29, which belongs to the TPS-b subfamily. MlongTPS29 could encode a limonene synthase and catalyze the biosynthesis of limonene, an important precursor of essential oils from the genus Mentha. This study provides useful information for the biosynthesis of terpenoids in the genus Mentha.

Download Full-text

Discovery of the human genome sequence in the public and private databases

Current Biology ◽

10.1016/s0960-9822(01)00490-0 ◽

2001 ◽

Vol 11 (20) ◽

pp. R808-R811 ◽

Cited By ~ 2

Author(s):

Stephen W Scherer ◽

Joseph Cheung

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genome Sequence ◽

Public And Private ◽

The Public

Download Full-text

Beyond the Genome: genomics research ten years after the human genome sequence

Genome Biology ◽

10.1186/gb-2010-11-11-309 ◽

2010 ◽

Vol 11 (11) ◽

pp. 309 ◽

Cited By ~ 3

Author(s):

Amanda M Casto ◽

Clara Amid

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genome Sequence ◽

Genomics Research

Download Full-text

Toward a Complete Human Genome Sequence

Genome Research ◽

10.1101/gr.8.11.1097 ◽

1998 ◽

Vol 8 (11) ◽

pp. 1097-1108 ◽

Cited By ~ 30

Author(s):

The Sanger Centre ◽

The Washington University Genome Sequencing Cente

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genome Sequence

Download Full-text

The tweenage human genome sequence

Nature Medicine ◽

10.1038/nm0211-155 ◽

2011 ◽

Vol 17 (2) ◽

pp. 155-155

Author(s):

Ewen Kirkness

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Human Genome Sequence

Download Full-text

Human Genome Sequence Variation and the Inherited Basis of Common Disease

Lecture Notes in Computer Science - Research in Computational Molecular Biology ◽

10.1007/11415770_45 ◽

2005 ◽

pp. 601-602

Author(s):

David Altshuler

Keyword(s):

Human Genome ◽

Genome Sequence ◽

Sequence Variation ◽

Human Genome Sequence ◽

Common Disease

Download Full-text