A chromosome-level reference genome of the hazelnut, Corylus heterophylla Fisch

Tiantian Zhao; Wenxu Ma; Zhen Yang; Lisong Liang; Xin Chen; Guixi Wang; Qinghua Ma; Lujun Wang

doi:10.1093/gigascience/giab027

A chromosome-level reference genome of the hazelnut, Corylus heterophylla Fisch

GigaScience ◽

10.1093/gigascience/giab027 ◽

2021 ◽

Vol 10 (4) ◽

Author(s):

Tiantian Zhao ◽

Wenxu Ma ◽

Zhen Yang ◽

Lisong Liang ◽

Xin Chen ◽

...

Keyword(s):

Molecular Mechanisms ◽

Common Ancestor ◽

Reference Genome ◽

Gene Prediction ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Reads ◽

Different Tissues ◽

Chromosome Level

Abstract Background Corylus heterophylla Fisch. is a species of the Betulaceae family native to China. As an economically and ecologically important nut tree, C. heterophylla can survive in extremely low temperatures (–30 to –40 °C). To deepen our knowledge of the Betulaceae species and facilitate the use of C. heterophylla for breeding and its genetic improvement, we have sequenced the whole genome of C. heterophylla. Findings Based on >64.99 Gb (∼175.30×) of Nanopore long reads, we assembled a 370.75-Mb C. heterophylla genome with contig N50 and scaffold N50 sizes of 2.07 and 31.33 Mb, respectively, accounting for 99.23% of the estimated genome size (373.61 Mb). Furthermore, 361.90 Mb contigs were anchored to 11 chromosomes using Hi-C link data, representing 97.61% of the assembled genome sequences. Transcriptomes representing 4 different tissues were sequenced to assist protein-coding gene prediction. A total of 27,591 protein-coding genes were identified, of which 92.02% (25,389) were functionally annotated. The phylogenetic analysis showed that C. heterophylla is close to Ostrya japonica, and they diverged from their common ancestor ∼52.79 million years ago. Conclusions We generated a high-quality chromosome-level genome of C. heterophylla. This genome resource will promote research on the molecular mechanisms of how the hazelnut responds to environmental stresses and serves as an important resource for genome-assisted improvement in cold and drought resistance of the Corylus genus.

Download Full-text

Chromosome-level genome assembly of a butterflyfish, Chelmon rostratus

10.1101/719187 ◽

2019 ◽

Author(s):

Xiaoyun Huang ◽

Yue Song ◽

Suyu Zhang ◽

A Yunga ◽

Mengqi Zhang ◽

...

Keyword(s):

Molecular Mechanisms ◽

Repetitive Sequences ◽

Ecological Environment ◽

Protein Coding ◽

Protein Coding Genes ◽

A Genome ◽

Genome Information ◽

Adaptation Evolution ◽

Core Genes ◽

Chromosome Level

AbstractChelmon rostratus (Teleostei, Perciformes, Chaetodontidae) is a copperband butterflyfish. As an ornamental fish, the genome information for this species might help understanding the genome evolution of Chaetodontidae and adaptation/evolution of coral reef fish.In this study, using the stLFR co-Barcode reads data, we assembled a genome of 638.70 Mb in size with contig and scaffold N50 sizes of 294.41 kb and 2.61 Mb, respectively. 94.40% of scaffold sequences were assigned to 24 chromosomes using Hi-C data and BUSCO analysis showed that 97.3% (2,579) of core genes were found in our assembly. Up to 21.47 % of the genome was found to be repetitive sequences and 21,375 protein-coding genes were annotated. Among these annotated protein-coding genes, 20,163 (94.33%) proteins were assigned with possible functions.As the first genome for Chaetodontidae family, the information of these data helpfully to improve the essential to the further understanding and exploration of marine ecological environment symbiosis with coral and the genomic innovations and molecular mechanisms contributing to its unique morphology and physiological features.

Download Full-text

A de novo genome assembly of the dwarfing pear rootstock Zhongai 1

Scientific Data ◽

10.1038/s41597-019-0291-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Chunqing Ou ◽

Fei Wang ◽

Jiahong Wang ◽

Song Li ◽

Yanjie Zhang ◽

...

Keyword(s):

De Novo ◽

Repetitive Sequences ◽

Draft Genome ◽

Genome Sequences ◽

Fruit Characteristics ◽

De Novo Genome Assembly ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Reads ◽

Cultivated Species

Abstract‘Zhongai 1’ [(Pyrus ussuriensis × communis) × spp.] is an excellent pear dwarfing rootstock common in China. It is dwarf itself and has high dwarfing efficiency on most of main Pyrus cultivated species when used as inter-stock. Here we describe the draft genome sequences of ‘Zhongai 1’ which was assembled using PacBio long reads, Illumina short reads and Hi-C technology. We estimated the genome size is approximately 511.33 Mb by K-mer analysis and obtained a final genome of 510.59 Mb with a contig N50 size of 1.28 Mb. Next, 506.31 Mb (99.16%) of contigs were clustered into 17 chromosomes with a scaffold N50 size of 23.45 Mb. We further predicted 309.86 Mb (60.68%) of repetitive sequences and 43,120 protein-coding genes. The assembled genome will be a valuable resource and reference for future pear breeding, genetic improvement, and comparative genomics among related species. Moreover, it will help identify genes involved in dwarfism, early flowering, stress tolerance, and commercially desirable fruit characteristics.

Download Full-text

De novo assembly of trachidermus fasciatus genome by nanopore sequencing

10.1101/2020.04.18.042093 ◽

2020 ◽

Author(s):

Gangcai Xie ◽

Xu Zhang ◽

Feng Lv ◽

Mengmeng Sang ◽

Hairong Hu ◽

...

Keyword(s):

Reference Genome ◽

De Novo ◽

Gene Prediction ◽

Protein Coding ◽

De Novo Gene ◽

Long Reads ◽

Resource Protection ◽

Trachidermus Fasciatus ◽

Roughskin Sculpin ◽

High Quality Genome

AbstractTrachidermus fasciatus is a roughskin sculpin fish widely located at the coastal areas of East Asia. Due to the environmental destruction and overfishing, the populations of this species have been under threat. It is important to have a reference genome to study the population genetics, domestic farming, and genetic resource protection. However, currently, there is no reference genome for Trachidermus fasciatus, which has greatly hurdled the studies on this species. In this study, we proposed to integrate nanopore long reads sequencing, Illumina short reads sequencing and Hi-C methods to thoroughly de novo assemble the genome of Trachidermus fasciatus. Our results provided a chromosome-level high quality genome assembly with a total length of about 543 Mb, and with N50 of 23 Mb. Based on de novo gene prediction and RNA sequencing information, a total of 38728 genes were found, including 23191 protein coding genes, 2149 small RNAs, 5572 rRNAs, and 7816 tRNAs. Besides, about 23% of the genome area is covered by the repetitive elements. Furthermore, The BUSCO evaluation of the completeness of the assembled genome is more than 96%, and the single base accuracy is 99.997%. Our study provided the first whole genome reference for the species of Trachidermus fasciatus, which might greatly facilitate the future studies on this species.

Download Full-text

Chromosome-level assembly of Drosophila bifasciata reveals important karyotypic transition of the X chromosome

10.1101/847558 ◽

2019 ◽

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

ABSTRACTThe Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromere, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

A Chromosome-Scale Genome Assembly Resource for Myriosclerotinia sulcatula Infecting Sedge Grass (Carex sp.)

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-03-20-0060-a ◽

2020 ◽

Vol 33 (7) ◽

pp. 880-883

Author(s):

Stefan Kusch ◽

Heba M. M. Ibrahim ◽

Catherine Zanchetta ◽

Celine Lopez-Roques ◽

Cecile Donnadieu ◽

...

Keyword(s):

Host Range ◽

Sclerotinia Sclerotiorum ◽

Genome Assembly ◽

Plant Pathogens ◽

Reference Genome ◽

Close Relative ◽

High Quality ◽

Protein Coding ◽

Protein Coding Genes ◽

Reference Genome Assembly

The fungus Myriosclerotinia sulcatula is a close relative of the notorious polyphagous plant pathogens Botrytis cinerea and Sclerotinia sclerotiorum but exhibits a host range restricted to plants from the Carex genus (Cyperaceae family). To date, there are no genomic resources available for fungi in the Myriosclerotinia genus. Here, we present a chromosome-scale reference genome assembly for M. sulcatula. The assembly contains 24 contigs with a total length of 43.53 Mbp, with scaffold N50 of 2,649.7 kbp and N90 of 1,133.1 kbp. BRAKER-predicted gene models were manually curated using WebApollo, resulting in 11,275 protein-coding genes that we functionally annotated. We provide a high-quality reference genome assembly and annotation for M. sulcatula as a resource for studying evolution and pathogenicity in fungi from the Sclerotiniaceae family.

Download Full-text

The sequencing and de novo assembly of the Larimichthys crocea genome using PacBio and Hi-C technologies

Scientific Data ◽

10.1038/s41597-019-0194-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 8

Author(s):

Baohua Chen ◽

Zhixiong Zhou ◽

Qiaozhen Ke ◽

Yidi Wu ◽

Huaqiang Bai ◽

...

Keyword(s):

Marine Fish ◽

Single Molecule ◽

Large Scale ◽

Reference Genome ◽

De Novo ◽

Larimichthys Crocea ◽

Chromosome Conformation ◽

Protein Coding ◽

Total Length ◽

Chromosome Level

Abstract Larimichthys crocea is an endemic marine fish in East Asia that belongs to Sciaenidae in Perciformes. L. crocea has now been recognized as an “iconic” marine fish species in China because not only is it a popular food fish in China, it is a representative victim of overfishing and still provides high value fish products supported by the modern large-scale mariculture industry. Here, we report a chromosome-level reference genome of L. crocea generated by employing the PacBio single molecule sequencing technique (SMRT) and high-throughput chromosome conformation capture (Hi-C) technologies. The genome sequences were assembled into 1,591 contigs with a total length of 723.86 Mb and a contig N50 length of 2.83 Mb. After chromosome-level scaffolding, 24 scaffolds were constructed with a total length of 668.67 Mb (92.48% of the total length). Genome annotation identified 23,657 protein-coding genes and 7262 ncRNAs. This highly accurate, chromosome-level reference genome of L. crocea provides an essential genome resource to support the development of genome-scale selective breeding and restocking strategies of L. crocea.

Download Full-text

Complete Genome Sequence of Acinetobacter indicus Type Strain SGAir0564 Isolated from Tropical Air Collected in Singapore

Genome Announcements ◽

10.1128/genomea.00230-18 ◽

2018 ◽

Vol 6 (18) ◽

pp. e00230-18 ◽

Cited By ~ 2

Author(s):

Vineeth Kodengil Vettath ◽

Ana Carolina M. Junqueira ◽

Akira Uchida ◽

Rikky W. Purbojati ◽

James N. I. Houghton ◽

...

Keyword(s):

Genome Sequence ◽

Complete Genome Sequence ◽

Complete Genome ◽

Type Strain ◽

Protein Coding ◽

Content Type ◽

Air Samples ◽

Protein Coding Genes ◽

Long Reads

ABSTRACT Acinetobacter indicus (Gammaproteobacteria) is a strict aerobic nonmotile bacterium. The strain SGAir0564 was isolated from air samples collected in Singapore. The complete genome is 3.1 Mb and was assembled using a combination of short and long reads. The genome contains 2,808 protein-coding genes, 80 tRNAs, and 21 rRNA subunits.

Download Full-text

Overlapping protein-coding genes in human genome and their coincidental expression in tissues

Scientific Reports ◽

10.1038/s41598-019-49802-w ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 2

Author(s):

Chao-Hsin Chen ◽

Chao-Yu Pan ◽

Wen-chang Lin

Keyword(s):

Human Genome ◽

Expression Profiles ◽

Tissue Expression ◽

Human Protein ◽

Clear Understanding ◽

Overlapping Genes ◽

Genome Sequences ◽

Protein Coding ◽

Protein Coding Genes ◽

Overlapping Gene

Abstract The completion of human genome sequences and the advancement of next-generation sequencing technologies have engendered a clear understanding of all human genes. Overlapping genes are usually observed in compact genomes, such as those of bacteria and viruses. Notably, overlapping protein-coding genes do exist in human genome sequences. Accordingly, we used the current Ensembl gene annotations to identify overlapping human protein-coding genes. We analysed 19,200 well-annotated protein-coding genes and determined that 4,951 protein-coding genes overlapped with their adjacent genes. Approximately a quarter of all human protein-coding genes were overlapping genes. We observed different clusters of overlapping protein-coding genes, ranging from two genes (paired overlapping genes) to 22 genes. We also divided the paired overlapping protein-coding gene groups into four subtypes. We found that the divergent overlapping gene subtype had a stronger expression association than did the subtypes of 5ʹ-tandem overlapping and 3ʹ-tandem overlapping genes. The majority of paired overlapping genes exhibited comparable coincidental tissue expression profiles; however, a few overlapping gene pairs displayed distinctive tissue expression association patterns. In summary, we have carefully examined the genomic features and distributions about human overlapping protein-coding genes and found coincidental expression in tissues for most overlapping protein-coding genes.

Download Full-text

Chromosome-Level Assembly of Drosophila bifasciata Reveals Important Karyotypic Transition of the X Chromosome

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400922 ◽

2020 ◽

Vol 10 (3) ◽

pp. 891-897 ◽

Cited By ~ 3

Author(s):

Ryan Bracewell ◽

Anita Tran ◽

Kamalakar Chatla ◽

Doris Bachtrog

Keyword(s):

X Chromosome ◽

Genome Assembly ◽

De Novo ◽

Pericentromeric Region ◽

Species Group ◽

Chromosome 15 ◽

Protein Coding ◽

Protein Coding Genes ◽

Long Read ◽

Chromosome Level

The Drosophila obscura species group is one of the most studied clades of Drosophila and harbors multiple distinct karyotypes. Here we present a de novo genome assembly and annotation of D. bifasciata, a species which represents an important subgroup for which no high-quality chromosome-level genome assembly currently exists. We combined long-read sequencing (Nanopore) and Hi-C scaffolding to achieve a highly contiguous genome assembly approximately 193 Mb in size, with repetitive elements constituting 30.1% of the total length. Drosophila bifasciata harbors four large metacentric chromosomes and the small dot, and our assembly contains each chromosome in a single scaffold, including the highly repetitive pericentromeres, which were largely composed of Jockey and Gypsy transposable elements. We annotated a total of 12,821 protein-coding genes and comparisons of synteny with D. athabasca orthologs show that the large metacentric pericentromeric regions of multiple chromosomes are conserved between these species. Importantly, Muller A (X chromosome) was found to be metacentric in D. bifasciata and the pericentromeric region appears homologous to the pericentromeric region of the fused Muller A-AD (XL and XR) of pseudoobscura/affinis subgroup species. Our finding suggests a metacentric ancestral X fused to a telocentric Muller D and created the large neo-X (Muller A-AD) chromosome ∼15 MYA. We also confirm the fusion of Muller C and D in D. bifasciata and show that it likely involved a centromere-centromere fusion.

Download Full-text

Mitochondrial Genome Sequences of Diorhabda carinata and Diorhabda carinulata, Two Beetle Species Introduced to North America for Biological Control

Microbiology Resource Announcements ◽

10.1128/mra.00690-19 ◽

2019 ◽

Vol 8 (35) ◽

Cited By ~ 1

Author(s):

A. R. Stahlke ◽

A. Z. Ozsoy ◽

D. W. Bean ◽

P. A. Hohenlohe

Keyword(s):

Biological Control ◽

North America ◽

Mitochondrial Genome ◽

Noncoding Region ◽

Beetle Species ◽

Genome Sequences ◽

Protein Coding ◽

Content Type ◽

Protein Coding Genes ◽

Genome Assemblies

We announce the complete circularized mitochondrial genome assemblies of Diorhabda carinata and Diorhabda carinulata, beetle species introduced to North America for the biological control of invasive shrubs of the genus Tamarix L. (Tamaricaceae). The assemblies (16,232 and 16,298 bp, respectively) each comprise 13 protein-coding genes, 22 tRNAs, two rRNAs, and a noncoding region.

Download Full-text