Draft Genome of the Rice Coral Montipora capitata Obtained from Linked-Read Sequencing

Martin Helmkampf; M Renee Bellinger; Scott M Geib; Sheina B Sim; Misaki Takabayashi

doi:10.1093/gbe/evz135

Draft Genome of the Rice Coral Montipora capitata Obtained from Linked-Read Sequencing

Genome Biology and Evolution ◽

10.1093/gbe/evz135 ◽

2019 ◽

Vol 11 (7) ◽

pp. 2045-2054 ◽

Cited By ~ 9

Author(s):

Martin Helmkampf ◽

M Renee Bellinger ◽

Scott M Geib ◽

Sheina B Sim ◽

Misaki Takabayashi

Keyword(s):

Tandem Repeats ◽

De Novo ◽

Draft Genome ◽

Cost Effective ◽

Sequence Length ◽

High Molecular Weight Dna ◽

Protein Coding ◽

Montipora Capitata ◽

Considerable Uncertainty ◽

Repeat Content

Abstract The rice coral, Montipora capitata, is widely distributed throughout the Indo-Pacific and comprises one of the most important reef-building species in the Hawaiian Islands. Here, we describe a de novo assembly of its genome based on a linked-read sequencing approach developed by 10x Genomics. The final draft assembly consisted of 27,870 scaffolds with a N50 size of 186 kb and contained a fairly complete set (81%) of metazoan benchmarking (BUSCO) genes. Based on haploid assembly size (615 Mb) and read k-mer profiles, we estimated the genome size to fall between 600 and 700 Mb, although the high fraction of repetitive sequence introduced considerable uncertainty. Repeat analysis indicated that 42% of the assembly consisted of interspersed, mostly unclassified repeats, and almost 3% tandem repeats. We also identified 36,691 protein-coding genes with a median coding sequence length of 807 bp, together spanning 7% of the assembly. The high repeat content and heterozygosity of the genome proved a challenging scenario for assembly, requiring additional steps to merge haplotypes and resulting in a higher than expected fragmentation at the scaffold level. Despite these challenges, the assembly turned out to be comparable in most quality measures to that of other available coral genomes while being considerably more cost-effective, especially with respect to long-read sequencing methods. Provided high-molecular-weight DNA is available, linked-read technology may thus serve as a valuable alternative capable of providing quality genome assemblies of nonmodel organisms.

Download Full-text

Four high-quality draft genome assemblies of the marine heterotrophic nanoflagellate Cafeteria roenbergensis

10.1101/751586 ◽

2019 ◽

Cited By ~ 1

Author(s):

Thomas Hackl ◽

Roman Martin ◽

Karina Barenhoff ◽

Sarah Duponchel ◽

Dominik Heider ◽

...

Keyword(s):

De Novo ◽

Draft Genome ◽

Gc Content ◽

Illumina Miseq ◽

Evolutionary Analysis ◽

High Quality ◽

Protein Coding ◽

Repeat Content ◽

Genome Assemblies ◽

Cafeteria Roenbergensis

AbstractThe heterotrophic stramenopile Cafeteria roenbergensis is a globally distributed marine bacterivorous protist. This unicellular flagellate is host to the giant DNA virus CroV and the virophage mavirus. We sequenced the genomes of four cultured C. roenbergensis strains and generated 23.53 Gb of Illumina MiSeq data (99-282 × coverage per strain) and 5.09 Gb of PacBio RSII data (13-54 × coverage). Using the Canu assembler and customized curation procedures, we obtained high-quality draft genome assemblies with a total length of 34-36 Mbp per strain and contig N50 lengths of 148 kbp to 464 kbp. The C. roenbergensis genome has a GC content of ~70%, a repeat content of ~28%, and is predicted to contain approximately 7857-8483 protein-coding genes based on a combination of de novo, homology-based and transcriptome-supported annotation. These first high-quality genome assemblies of a Bicosoecid fill an important gap in sequenced Stramenopile representatives and enable a more detailed evolutionary analysis of heterotrophic protists.

Download Full-text

Draft genome assembly data of Anoxybacillus sp. strain MB8 isolated from Tattapani hot springs, India

10.1101/2021.06.09.447659 ◽

2021 ◽

Author(s):

VISHNU PRASOODANAN P K ◽

Shruti S. Menon ◽

Rituja Saxena ◽

Prashant Waiker ◽

Vineet K Sharma

Keyword(s):

Hot Springs ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Central India ◽

Glycoside Hydrolases ◽

Rrna Gene ◽

Aerobic Bacterium ◽

Protein Coding ◽

Protein Coding Genes

Discovery of novel thermophiles has shown promising applications in the field of biotechnology. Due to their thermal stability, they can survive the harsh processes in the industries, which make them important to be characterized and studied. Members of Anoxybacillus are alkaline tolerant thermophiles and have been extensively isolated from manure, dairy-processed plants, and geothermal hot springs. This article reports the assembled data of an aerobic bacterium Anoxybacillus sp. strain MB8, isolated from the Tattapani hot springs in Central India, where the 16S rRNA gene shares an identity of 97% (99% coverage) with Anoxybacillus kamchatkensis strain G10. The de novo assembly and annotation performed on the genome of Anoxybacillus sp. strain MB8 comprises of 2,898,780 bp (in 190 contigs) with a GC content of 41.8% and includes 2,976 protein-coding genes,1 rRNA operon, 73 tRNAs, 1 tm-RNA and 10 CRISPR arrays. The predicted protein-coding genes have been classified into 21 eggNOG categories. The KEGG Automated Annotation Server (KAAS) analysis indicated the presence of assimilatory sulfate reduction pathway, nitrate reducing pathway, and genes for glycoside hydrolases (GHs) and glycoside transferase (GTs). GHs and GTs hold widespread applications, in the baking and food industry for bread manufacturing, and in the paper, detergent and cosmetic industry. Hence, Anoxybacillus sp. strain MB8 holds the potential to be screened and characterized for such commercially relevant enzymes.

Download Full-text

Genome assembly of the Australian black tiger shrimp (Penaeus monodon) reveals a fragmented IHHNV EVE sequence

10.1101/2021.11.11.468259 ◽

2021 ◽

Author(s):

Roger Huerlimann ◽

Jeff A Cowley ◽

Nicholas M Wade ◽

Yinan Wang ◽

Naga Kasinadhuni ◽

...

Keyword(s):

Genome Assembly ◽

Draft Genome ◽

Penaeus Monodon ◽

Penaeid Shrimp ◽

Protein Coding ◽

Black Tiger Shrimp ◽

Draft Genome Assembly ◽

Repeat Content ◽

Tiger Shrimp ◽

Endogenous Viral Elements

Shrimp are a valuable aquaculture species globally; however, disease remains a major hindrance to shrimp aquaculture sustainability and growth. Mechanisms mediated by endogenous viral elements (EVEs) have been proposed as a means by which shrimp that encounter a new virus start to accommodate rather than succumb to infection over time. However, evidence on the nature of such EVEs and how they mediate viral accommodation is limited. More extensive genomic data on Penaeid shrimp from different geographical locations should assist in exposing the diversity of EVEs. In this context, reported here is a PacBio Sequel-based draft genome assembly of an Australian black tiger shrimp (Penaeus monodon) inbred for one generation. The 1.89 Gbp draft genome is comprised of 31,922 scaffolds (N50: 496,398 bp) covering 85.9% of the projected genome size. The genome repeat content (61.8% with 30% representing simple sequence repeats) is almost the highest identified for any species. The functional annotation identified 35,517 gene models, of which 25,809 were protein-coding and 17,158 were annotated using interproscan. Scaffold scanning for specific EVEs identified an element comprised of a 9,045 bp stretch of repeated, inverted and jumbled genome fragments of Infectious hypodermal and hematopoietic necrosis virus (IHHNV) bounded by a repeated 591/590 bp host sequence. As only near complete linear ~4 kb IHHNV genomes have been found integrated in the genome of P. monodon previously, its discovery has implications regarding the validity of PCR tests designed to specifically detect such linear EVE types. The existence of conjoined inverted IHHNV genome fragments also provides a means by which hairpin dsRNAs could be expressed and processed by the shrimp RNA interference (RNAi) machinery.

Download Full-text

Recovery of non-reference sequences missing from the human reference genome

BMC Genomics ◽

10.1186/s12864-019-6107-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Ran Li ◽

Xiaomeng Tian ◽

Peng Yang ◽

Yingzhi Fan ◽

Ming Li ◽

...

Keyword(s):

Human Genome ◽

Tandem Repeats ◽

Reference Genome ◽

De Novo ◽

Precise Location ◽

Protein Coding ◽

Human Reference Genome ◽

Mhc Haplotype ◽

Reference Sequences ◽

Flanking Regions

Abstract Background The non-reference sequences (NRS) represent structure variations in human genome with potential functional significance. However, besides the known insertions, it is currently unknown whether other types of structure variations with NRS exist. Results Here, we compared 31 human de novo assemblies with the current reference genome to identify the NRS and their location. We resolved the precise location of 6113 NRS adding up to 12.8 Mb. Besides 1571 insertions, we detected 3041 alternate alleles, which were defined as having less than 90% (or none) identity with the reference alleles. These alternate alleles overlapped with 1143 protein-coding genes including a putative novel MHC haplotype. Further, we demonstrated that the alternate alleles and their flanking regions had high content of tandem repeats, indicating that their origin was associated with tandem repeats. Conclusions Our study detected a large number of NRS including many alternate alleles which are previously uncharacterized. We suggested that the origin of alternate alleles was associated with tandem repeats. Our results enriched the spectrum of genetic variations in human genome.

Download Full-text

Draft Genome of the Macadamia Husk Spot Pathogen, Pseudocercospora macadamiae

Phytopathology ◽

10.1094/phyto-12-19-0460-a ◽

2020 ◽

Vol 110 (9) ◽

pp. 1503-1506

Author(s):

Olufemi A. Akinsanmi ◽

Lilia C. Carvalhais

Keyword(s):

Plant Disease Resistance ◽

Plant Disease ◽

De Novo ◽

Draft Genome ◽

Gc Content ◽

Disease Development ◽

Closely Related Species ◽

Protein Coding ◽

Protein Coding Genes ◽

The Family

Pseudocercospora macadamiae causes husk spot in macadamia in Australia. Lack of genomic resources for this pathogen has restricted acquiring knowledge on the mechanism of disease development, spread, and its role in fruit abscission. To address this gap, we sequenced the genome of P. macadamiae. The sequence was de novo assembled into a draft genome of 40 Mb, which is comparable to closely related species in the family Mycosphaerellaceae. The draft genome comprises 212 scaffolds, of which 99 scaffolds are over 50 kb. The genome has a 49% GC content and is predicted to contain 15,430 protein-coding genes. This draft genome sequence is the first for P. macadamiae and represents a valuable resource for understanding genome evolution and plant disease resistance.

Download Full-text

Draft Genome Sequence of Rheinheimera sp. F8, a Biofilm-Forming Strain Which Produces Large Amounts of Extracellular DNA

Genome Announcements ◽

10.1128/genomea.00082-16 ◽

2016 ◽

Vol 4 (2) ◽

Cited By ~ 3

Author(s):

Anna-Kathrin Schuster ◽

Ulrich Szewzyk

Keyword(s):

Genome Sequence ◽

De Novo ◽

Draft Genome ◽

Extracellular Dna ◽

Draft Genome Sequence ◽

Protein Coding ◽

Coding Sequences

Rheinheimera sp. strain F8 is a biofilm-forming gammaproteobacterium that has been found to produce large amounts of filamentous extracellular DNA. Here, we announce the de novo assembly of its genome. It is estimated to be 4,464,511 bp in length, with 3,970 protein-coding sequences and 92 RNA-coding sequences.

Download Full-text

De Novo Whole-Genome Sequencing of the Wood Rot Fungus Polyporus brumalis, Which Exhibits Potential Terpenoid Metabolism

Genome Announcements ◽

10.1128/genomea.00586-17 ◽

2017 ◽

Vol 5 (28) ◽

Author(s):

Su-Yeon Lee ◽

Ji-eun An ◽

Sun-Hwa Ryu ◽

Myungkil Kim

Keyword(s):

Single Molecule ◽

De Novo ◽

Gene Annotation ◽

Draft Genome ◽

Fungal Growth ◽

Protein Coding ◽

Sequencing Platform ◽

Protein Coding Genes ◽

Polyporus Brumalis ◽

Terpenoid Metabolism

ABSTRACT Polyporus brumalis is able to synthesize several sesquiterpenes during fungal growth. Using a single-molecule real-time sequencing platform, we present the 53-Mb draft genome of P. brumalis, which contains 6,231 protein-coding genes. Gene annotation and isolation support genetic information, which can increase the understanding of sesquiterpene metabolism in P. brumalis.

Download Full-text

The human gene damage index as a gene-level approach to prioritizing exome variants

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1518646112 ◽

2015 ◽

Vol 112 (44) ◽

pp. 13615-13620 ◽

Cited By ~ 131

Author(s):

Yuval Itan ◽

Lei Shang ◽

Bertrand Boisson ◽

Etienne Patin ◽

Alexandre Bolze ◽

...

Keyword(s):

General Population ◽

De Novo ◽

Damage Index ◽

Monogenic Disease ◽

Sequence Length ◽

De Novo Mutations ◽

Protein Coding ◽

A Genome ◽

Gene Level ◽

Gene Damage

The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.

Download Full-text

Improved chromosome-level genome assembly and annotation of the seagrass, Zostera marina (eelgrass)

F1000Research ◽

10.12688/f1000research.38156.1 ◽

2021 ◽

Vol 10 ◽

pp. 289

Author(s):

Xiao Ma ◽

Jeanine L. Olsen ◽

Thorsten B.H. Reusch ◽

Gabriele Procaccini ◽

Dave Kudrna ◽

...

Keyword(s):

Genome Assembly ◽

Zostera Marina ◽

Draft Genome ◽

High Molecular Weight Dna ◽

Protein Coding ◽

New Findings ◽

Long Read ◽

Sanger Sequence ◽

Assembly Pipeline ◽

Chromosome Level

Background: Seagrasses (Alismatales) are the only fully marine angiosperms. Zostera marina (eelgrass) plays a crucial role in the functioning of coastal marine ecosystems and global carbon sequestration. It is the most widely studied seagrass and has become a marine model system for exploring adaptation under rapid climate change. The original draft genome (v.1.0) of the seagrass Z. marina (L.) was based on a combination of Illumina mate-pair libraries and fosmid-ends. A total of 25.55 Gb of Illumina and 0.14 Gb of Sanger sequence was obtained representing 47.7× genomic coverage. The assembly resulted in ~2000 unordered scaffolds (L50 of 486 Kb), a final genome assembly size of 203MB, 20,450 protein coding genes and 63% TE content. Here, we present an upgraded chromosome-scale genome assembly and compare v.1.0 and the new v.3.1, reconfirming previous results from Olsen et al. (2016), as well as pointing out new findings. Methods: The same high molecular weight DNA used in the original sequencing of the Finnish clone was used. A high-quality reference genome was assembled with the MECAT assembly pipeline combining PacBio long-read sequencing and Hi-C scaffolding. Results: In total, 75.97 Gb PacBio data was produced. The final assembly comprises six pseudo-chromosomes and 304 unanchored scaffolds with a total length of 260.5Mb and an N50 of 34.6 MB, showing high contiguity and few gaps (~0.5%). 21,483 protein-encoding genes are annotated in this assembly, of which 20,665 (96.2%) obtained at least one functional assignment based on similarity to known proteins. Conclusions: As an important marine angiosperm, the improved Z. marina genome assembly will further assist evolutionary, ecological, and comparative genomics at the chromosome level. The new genome assembly will further our understanding into the structural and physiological adaptations from land to marine life.

Download Full-text

Draft Genome Assembly and Annotation of Red Raspberry Rubus Idaeus

10.1101/546135 ◽

2019 ◽

Cited By ~ 4

Author(s):

Haley Wight ◽

Junhui Zhou ◽

Muzi Li ◽

Sridhar Hannenhalli ◽

Stephen M. Mount ◽

...

Keyword(s):

De Novo ◽

Draft Genome ◽

Rubus Idaeus ◽

Slow Process ◽

Red Raspberry ◽

Protein Coding ◽

Draft Genome Assembly ◽

Protein Coding Genes ◽

A Genome ◽

Exceptional Value

AbstractThe red raspberry, Rubus idaeus, is widely distributed in all temperate regions of Europe, Asia, and North America and is a major commercial fruit valued for its taste, high antioxidant and vitamin content. However, Rubus breeding is a long and slow process hampered by limited genomic and molecular resources. Genomic resources such as a complete genome sequencing and transcriptome will be of exceptional value to improve research and breeding of this high value crop. Using a hybrid sequence assembly approach including data from both long and short sequence reads, we present the first assembly of the Rubus idaeus genome (Joan J. variety). The de novo assembled genome consists of 2,145 scaffolds with a genome completeness of 95.3% and an N50 score of 638 KB. Leveraging a linkage map, we anchored 80.1% of the genome onto seven chromosomes. Using over 1 billion paired-end RNAseq reads, we annotated 35,566 protein coding genes with a transcriptome completeness score of 97.2%. The Rubus idaeus genome provides an important new resource for researchers and breeders.

Download Full-text