De novo phased assembly of the Vitis riparia grape genome

Mapping Intimacies ◽

10.1101/640565 ◽

2019 ◽

Author(s):

Nabil Girollet ◽

Bernadette Rubio ◽

Pierre-François Bert

Keyword(s):

Genome Assembly ◽

De Novo ◽

Genomic Analysis ◽

Comparative Genomic ◽

Protein Coding ◽

Important Species ◽

Vitis Riparia ◽

Fruit Species ◽

Long Reads ◽

A Genome

AbstractGrapevine is one of the most important fruit species in the world. In order to better understand genetic basis of traits variation and facilitate the breeding of new genotypes, we sequenced, assembled, and annotated the genome of the American native Vitis riparia, one of the main species used worldwide for rootstock and scion breeding. A total of 164 Gb raw DNA reads were obtained from Vitis riparia resulting in a 225X depth of coverage. We generated a genome assembly of the V. riparia grape de novo using the PacBio long-reads that was phased with the 10x Genomics Chromium linked-reads. At the chromosome level, a 500 Mb genome was generated with a scaffold N50 size of 1 Mb. More than 34% of the whole genome were identified as repeat sequences, and 37,207 protein-coding genes were predicted. This genome assembly sets the stage for comparative genomic analysis of the diversification and adaptation of grapevine and will provide a solid resource for further genetic analysis and breeding of this economically important species.

Download Full-text

Genome assembly of the JD17 soybean provides a new reference genome for Comparative genomics

10.1101/2021.11.23.469778 ◽

2021 ◽

Author(s):

Xinxin Yi ◽

Jing Liu ◽

Shengcai Chen ◽

Hao Wu ◽

Min Liu ◽

...

Keyword(s):

Nitrogen Fixation ◽

Genome Assembly ◽

Reference Genome ◽

De Novo ◽

Genomic Analysis ◽

Comparative Genomic ◽

High Quality ◽

Genome Wide ◽

A Genome ◽

Cultivated Soybean

Cultivated soybean (Glycine max) is an important source for protein and oil. Many elite cultivars with different traits have been developed for different conditions. Each soybean strain has its own genetic diversity, and the availability of more high-quality soybean genomes can enhance comparative genomic analysis for identifying genetic underpinnings for its unique traits. In this study, we constructed a high-quality de novo assembly of an elite soybean cultivar Jidou 17 (JD17) with chromsome contiguity and high accuracy. We annotated 52,840 gene models and reconstructed 74,054 high-quality full-length transcripts. We performed a genome-wide comparative analysis based on the reference genome of JD17 with three published soybeans (WM82, ZH13 and W05) , which identified five large inversions and two large translocations specific to JD17, 20,984 - 46,912 PAVs spanning 13.1 - 46.9 Mb in size, and 5 - 53 large PAV clusters larger than 500kb. 1,695,741 - 3,664,629 SNPs and 446,689 - 800,489 Indels were identified and annotated between JD17 and them. Symbiotic nitrogen fixation (SNF) genes were identified and the effects from these variants were further evaluated. It was found that the coding sequences of 9 nitrogen fixation-related genes were greatly affected. The high-quality genome assembly of JD17 can serve as a valuable reference for soybean functional genomics research.

Download Full-text

Nanopore long reads enable the first complete genome assembly of a Malaysian Vibrio parahaemolyticus isolate bearing the pVa plasmid associated with acute hepatopancreatic necrosis disease

10.1101/861476 ◽

2019 ◽

Author(s):

Han Ming Gan ◽

Christopher M. Austin

Keyword(s):

Genome Assembly ◽

Vibrio Parahaemolyticus ◽

De Novo ◽

Genomic Analysis ◽

Binary Toxin ◽

Comparative Genomic ◽

Chromosome 2 ◽

Toxin Genes ◽

Long Reads ◽

Long Read

AbstractBackgroundVibrio parahaemolyticus MVP1 was isolated from a Malaysian aquaculture farm affected with shrimp acute hepatopancreatic necrosis disease (AHPND). Its genome was previously sequenced on the Illumina MiSeq platform and assembled de novo producing a relatively fragmented assembly. Despite identifying the binary toxin genes in the MVP1 draft genome that were linked to AHPND, the toxin genes were localized on a very small contig precluding proper analysis of gene neighbourhood.MethodsThe genome of Vibrio parahaemolyticus MVP1 was sequenced on the Nanopore MinION device to obtain long reads that can span longer repeats and improve genome contiguity. De novo genome assembly was subsequently performed using long-read only assembler (Flye) followed by genome polishing as well as hybrid assembler (Unicycler).ResultsLong-read only assembly produced three complete circular MVP1 contigs consisting of chromosome 1, chromosome 2 and the pVa plasmid that pirABvp binary toxin genes. Polishing of the long read assembly with Illumina short reads was necessary to remove indel errors. The complete assembly of the pVa plasmid could not be achieved using Illumina reads due to the presence of identical repetitive elements flanking the binary toxin genes leading to multiple contigs. Whereas these regions were fully spanned by the Nanopore long reads resulting in a single contig. In addition, alignment of Illumina reads to the complete genome assembly indicated there is sequencing bias as read depth was lowest in low-GC genomic regions. Comparative genomic analysis revealed the presence of a gene cluster coding for additional insecticidal toxins in chromosome 2 of MVP1 that may further contribute to host pathogenesis pending functional validation. Scanning of all publicly available V. parahaemolyticus genomes revealed the presence of a single AinS-family quorum-sensing system in this species that can be targeted for future microbial management.ConclusionsWe generated the first chromosome-scale genome assembly of a Malaysian pirABVp-bearing V. parahaemolyticus isolate. Structural variations identified from comparative genomic analysis provide new insights into the genomic features of V. parahaemolyticus MVP1 that may be associated with host colonization and pathogenicity.

Download Full-text

Nanopore long reads enable the first complete genome assembly of a Malaysian Vibrio parahaemolyticus isolate bearing the pVa plasmid associated with acute hepatopancreatic necrosis disease

F1000Research ◽

10.12688/f1000research.21570.1 ◽

2019 ◽

Vol 8 ◽

pp. 2108 ◽

Cited By ~ 1

Author(s):

Han Ming Gan ◽

Christopher M Austin

Keyword(s):

Genome Assembly ◽

De Novo ◽

Genomic Analysis ◽

Binary Toxin ◽

Comparative Genomic ◽

Chromosome 2 ◽

Toxin Genes ◽

Long Reads ◽

Long Read ◽

Acute Hepatopancreatic Necrosis Disease

Background: The genome of Vibrio parahaemolyticus MVP1, isolated from a Malaysian aquaculture farm with shrimp acute hepatopancreatic necrosis disease (AHPND), was previously sequenced using Illumina MiSeq and assembled de novo, producing a relatively fragmented assembly. Despite identifying the binary toxin genes in the MVP1 draft genome that were linked to AHPND, the toxin genes were localized on a very small contig precluding proper analysis of gene neighbourhood. Methods: The genome of MVP1 was sequenced on Nanopore MinION to obtain long reads to improve genome contiguity. De novo genome assembly was performed using long-read only assembler followed by genome polishing and hybrid assembler. Results: Long-read assembly produced three complete circular MVP1 contigs: chromosome 1, chromosome 2 and the pVa plasmid encoding pirABvp binary toxin genes. Polishing of the long-read assembly with Illumina short reads was necessary to remove indel errors. Complete assembly of the pVa plasmid could not be achieved using Illumina reads due to identical repetitive elements flanking the binary toxin genes leading to multiple contigs. These regions were fully spanned by the Nanopore long-reads resulting in a single contig. Alignment of Illumina reads to the complete genome assembly indicated there is sequencing bias as read depth was lowest in low-GC genomic regions. Comparative genomic analysis revealed a gene cluster coding for additional insecticidal toxins in chromosome 2 of MVP1 that may further contribute to host pathogenesis pending functional validation. Scanning of publicly available V. parahaemolyticus genomes revealed the presence of a single AinS-family quorum-sensing system that can be targeted for future microbial management. Conclusions: We generated the first chromosome-scale genome assembly of a Malaysian pirABVp-bearing V. parahaemolyticus isolate. Structural variations identified from comparative genomic analysis provide new insights into the genomic features of V. parahaemolyticus MVP1 that may be associated with host colonization and pathogenicity.

Download Full-text

De novo Genome Assembly of the indica Rice Variety IR64 Using Linked-Read Sequencing and Nanopore Sequencing

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400871 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1495-1501 ◽

Cited By ~ 1

Author(s):

Tsuyoshi Tanaka ◽

Ryo Nishijima ◽

Shota Teramoto ◽

Yuka Kitomi ◽

Takeshi Hayashi ◽

...

Keyword(s):

Functional Genomics ◽

Genome Assembly ◽

De Novo ◽

Rice Variety ◽

Rice Genome ◽

High Yield ◽

Nanopore Sequencing ◽

Long Reads ◽

A Genome ◽

Modern Varieties

IR64 is a rice variety with high-yield that has been widely cultivated around the world. IR64 has been replaced by modern varieties in most growing areas. Given that modern varieties are mostly progenies or relatives of IR64, genetic analysis of IR64 is valuable for rice functional genomics. However, chromosome-level genome sequences of IR64 have not been available previously. Here, we sequenced the IR64 genome using synthetic long reads obtained by linked-read sequencing and ultra-long reads obtained by nanopore sequencing. We integrated these data and generated the de novo assembly of the IR64 genome of 367 Mb, equivalent to 99% of the estimated size. Continuity of the IR64 genome assembly was improved compared with that of a publicly available IR64 genome assembly generated by short reads only. We annotated 41,458 protein-coding genes, including 657 IR64-specific genes, that are missing in other high-quality rice genome assemblies IRGSP-1.0 of japonica cultivar Nipponbare or R498 of indica cultivar Shuhui498. The IR64 genome assembly will serve as a genome resource for rice functional genomics as well as genomics-driven and/or molecular breeding.

Download Full-text

ELFN1-AS1: A Novel Primate Gene with Possible MicroRNA Function Expressed Predominantly in Human Tumors

BioMed Research International ◽

10.1155/2014/398097 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 6

Author(s):

Dmitrii E. Polev ◽

Iuliia K. Karnaukhova ◽

Larisa L. Krukovskaya ◽

Andrei P. Kozlov

Keyword(s):

De Novo ◽

Human Gene ◽

Homo Sapiens ◽

Genomic Analysis ◽

Structure Characteristic ◽

Comparative Genomic ◽

Protein Coding ◽

Regulatory Motifs ◽

Normal Tissues ◽

Microrna Function

Human geneLOC100505644 uncharacterized LOC100505644 [Homo sapiens](Entrez Gene ID 100505644) is abundantly expressed in tumors but weakly expressed in few normal tissues. Till now the function of this gene remains unknown. Here we identified the chromosomal borders of the transcribed region and the major splice form of theLOC100505644-specific transcript. We characterised the major regulatory motifs of the gene and its splice sites. Analysis of the secondary structure of the major transcript variant revealed a hairpin-like structure characteristic for precursor microRNAs. Comparative genomic analysis of the locus showed that it originated in primatesde novo. Taken together, our data indicate that human geneLOC100505644encodes some non-protein coding RNA, likely a microRNA. It was assigned a gene symbolELFN1-AS1(ELFN1 antisense RNA 1 (non-protein coding)). This gene combines features of evolutionary novelty and predominant expression in tumors.

Download Full-text

A chromosome-level genome assembly of the Chinese tupelo Nyssa sinensis

Scientific Data ◽

10.1038/s41597-019-0296-y ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Xuchen Yang ◽

Minghui Kang ◽

Yanting Yang ◽

Haifeng Xiong ◽

Mingcheng Wang ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Chromosome Conformation ◽

Protein Coding ◽

Single Molecule Sequencing ◽

Data Matching ◽

Long Reads ◽

Autumn Leaf ◽

Chromosome Level

AbstractThe deciduous Chinese tupelo (Nyssa sinensis Oliv.) is a popular ornamental tree for the spectacular autumn leaf color. Here, using single-molecule sequencing and chromosome conformation capture data, we report a high-quality, chromosome-level genome assembly of N. sinensis. PacBio long reads were de novo assembled into 647 polished contigs with a total length of 1,001.42 megabases (Mb) and an N50 size of 3.62 Mb, which is in line with genome sizes estimated using flow cytometry and the k-mer analysis. These contigs were further clustered and ordered into 22 pseudo-chromosomes based on Hi-C data, matching the chromosome counts in Nyssa obtained from previous cytological studies. In addition, a total of 664.91 Mb of repetitive elements were identified and a total of 37,884 protein-coding genes were predicted in the genome of N. sinensis. All data were deposited in publicly available repositories, and should be a valuable resource for genomics, evolution, and conservation biology.

Download Full-text

The complete genome sequence of the Staphylococcus bacteriophage Metroid

10.1101/2020.05.01.072256 ◽

2020 ◽

Author(s):

Adele Crane ◽

Joy Abaidoo ◽

Gabriella Beltran ◽

Danielle Fry ◽

Colleen Furey ◽

...

Keyword(s):

Complete Genome ◽

Bacterial Infections ◽

Genomic Analysis ◽

Host Immunity ◽

Comparative Genomic ◽

Rapid Adaptation ◽

Protein Coding ◽

Ongoing Effort ◽

A Genome ◽

Lysis Cassette

AbstractPhages infecting bacteria of the genus Staphylococcus play an important role in their host’s ecology and evolution. On one hand, horizontal gene transfer from phage can encourage the rapid adaptation of pathogenic Staphylococcus enabling them to escape host immunity or access novel environments. On the other hand, lytic phages are promising agents for the treatment of bacterial infections, especially those resistant to antibiotics. As part of an ongoing effort to gain novel insights into bacteriophage diversity, we characterized the complete genome of the Staphylococcus bacteriophage Metroid, a cluster C phage with a genome size of 151kb, encompassing 254 predicted protein-coding genes as well as 4 tRNAs. A comparative genomic analysis highlights strong similarities – including a conservation of the lysis cassette – with other Staphylococcus cluster C1 bacteriophages, several of which were previously characterized for therapeutic applications.

Download Full-text

A genome assembly-integrated dog 1 Mb BAC microarray: a cytogenetic resource for canine cancer studies and comparative genomic analysis

Cytogenetic and Genome Research ◽

10.1159/000163088 ◽

2008 ◽

Vol 122 (2) ◽

pp. 110-121 ◽

Cited By ~ 24

Author(s):

R. Thomas ◽

S.E. Duke ◽

E.K. Karlsson ◽

A. Evans ◽

P. Ellis ◽

...

Keyword(s):

Genome Assembly ◽

Genomic Analysis ◽

Comparative Genomic Analysis ◽

Comparative Genomic ◽

A Genome ◽

Canine Cancer ◽

Cancer Studies

Download Full-text

The Complete Genome Sequence of the Staphylococcus Bacteriophage Metroid

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401365 ◽

2020 ◽

Vol 10 (9) ◽

pp. 2975-2979

Author(s):

Adele Crane ◽

Joy Abaidoo ◽

Gabriella Beltran ◽

Danielle Fry ◽

Colleen Furey ◽

...

Keyword(s):

Complete Genome ◽

Bacterial Infections ◽

Genomic Analysis ◽

Host Immunity ◽

Comparative Genomic ◽

Rapid Adaptation ◽

Protein Coding ◽

Ongoing Effort ◽

A Genome ◽

Lysis Cassette

Abstract Phages infecting bacteria of the genus Staphylococcus play an important role in their host’s ecology and evolution. On one hand, horizontal gene transfer from phage can encourage the rapid adaptation of pathogenic Staphylococcus enabling them to escape host immunity or access novel environments. On the other hand, lytic phages are promising agents for the treatment of bacterial infections, especially those resistant to antibiotics. As part of an ongoing effort to gain novel insights into bacteriophage diversity, we characterized the complete genome of the Staphylococcus bacteriophage Metroid, a cluster C phage with a genome size of 151kb, encompassing 254 predicted protein-coding genes as well as 4 tRNAs. A comparative genomic analysis highlights strong similarities – including a conservation of the lysis cassette – with other Staphylococcus cluster C bacteriophages, several of which were previously characterized for therapeutic applications.

Download Full-text

High-quality de novo genome assembly of Kappaphycus alvarezii based on both PacBio and HiSeq sequencing

10.1101/2020.02.15.950402 ◽

2020 ◽

Author(s):

Shangang Jia ◽

Guoliang Wang ◽

Guiming Liu ◽

Jiangyong Qu ◽

Beilun Zhao ◽

...

Keyword(s):

Single Molecule ◽

Genome Assembly ◽

De Novo ◽

Kappaphycus Alvarezii ◽

Draft Genome ◽

Production Traits ◽

Illumina Hiseq ◽

De Novo Genome Assembly ◽

Protein Coding ◽

Long Reads

ABSTRACTThe red algae Kappaphycus alvarezii is the most important aquaculture species in Kappaphycus, widely distributed in tropical waters, and it has become the main crop of carrageenan production at present. The mechanisms of adaptation for high temperature, high salinity environments and carbohydrate metabolism may provide an important inspiration for marine algae study. Scientific background knowledge such as genomic data will be also essential to improve disease resistance and production traits of K. alvarezii. 43.28 Gb short paired-end reads and 18.52 Gb single-molecule long reads of K. alvarezii were generated by Illumina HiSeq platform and Pacbio RSII platform respectively. The de novo genome assembly was performed using Falcon_unzip and Canu software, and then improved with Pilon. The final assembled genome (336 Mb) consists of 888 scaffolds with a contig N50 of 849 Kb. Further annotation analyses predicted 21,422 protein-coding genes, with 61.28% functionally annotated. Here we report the draft genome and annotations of K. alvarezii, which are valuable resources for future genomic and genetic studies in Kappaphycus and other algae.

Download Full-text