Draft sequencing and assembly of the genome of the world’s largest fish, the whale shark: Rhincodon typus Smith 1828

10.7287/peerj.preprints.837v1 ◽

2015 ◽

Cited By ~ 1

Author(s):

Timothy D Read ◽

Robert A Petit III ◽

Sandeep J Joseph ◽

Md T Alam ◽

Ryan Weil ◽

...

Keyword(s):

De Novo ◽

Nuclear Genome ◽

Receptor Protein ◽

Shark Species ◽

Whale Shark ◽

Conservation Units ◽

Data Set ◽

Rhincodon Typus ◽

A Genome ◽

Extant Species

The whale shark (Rhincodon typus) has by far the largest body size of any elasmobranch (shark or ray) species and is therefore also the largest extant species of the paraphyletic assemblage commonly referred to as “fishes”. As both a phenotypic extreme and a member of the group basal to the remaining gnathostomes, which includes all tetrapods and therefore also humans, its genome is of substantial comparative interest. Whale sharks are also listed as a “vulnerable” species on the International Union for Conservation of Nature (IUCN)'s Red List of threatened species and are of growing popularity as both a target of ecotourism and as a charismatic conservation ambassador for the pelagic ecosystem. A genome map for this species would aid in defining effective conservation units and understanding global population structure. We characterised the nuclear genome of the whale shark using next generation sequencing (454, Illumina) and de novo assembly and annotation methods, based on material collected from the Georgia Aquarium. The data set consisted of 878,654,233 reads, which assembled into 11,347,816 contigs and 3,606,038 scaffolds. The estimated genome size was 3.44Gb. As expected, the proteome of the whale shark was most closely related to the only other complete genome of a cartilaginous fish, the Holocephali Elephant shark. The whale shark contained a novel Toll-like-receptor protein with sequence conservation to both the TLR4 and TLR13 proteins of mammals. The data are publicly available on a Galaxy bioinformatic server (http://whaleshark.georgiaaquarium.org). This represents the first shotgun elasmobranch genome and will aid studies of molecular systematics, biogeography, genetic differentiation, and conservation genetics in this and other shark species, as well as providing comparative data for studies of evolutionary biology and immunology across the jawed vertebrate lineages.

Download Full-text

IterCluster: a barcode clustering algorithm for long fragment read analysis

PeerJ ◽

10.7717/peerj.8431 ◽

2020 ◽

Vol 8 ◽

pp. e8431

Author(s):

Jiancong Weng ◽

Tian Chen ◽

Yinlong Xie ◽

Xun Xu ◽

Gengyun Zhang ◽

...

Keyword(s):

De Novo Assembly ◽

Clustering Algorithm ◽

De Novo ◽

Recall Rate ◽

Divide And Conquer ◽

Data Set ◽

Target Region ◽

Additional Information ◽

A Genome ◽

Long Fragment Read

Recent advances in long fragment read (LFR, also known as linked-read technologies or read-cloud) technologies, such as single tube long fragment reads (stLFR), 10X Genomics Chromium reads, and TruSeq synthetic long-reads, have enabled efficient haplotyping and genome assembly. However, in the case of stLFR and 10X Genomics Chromium reads, the long fragments of a genome are covered sparsely by reads in each barcode and most barcodes are contained in multiple long fragments from different regions, which results in inefficient assembly when using long-range information. Thus, methods to address these shortcomings are vital for capitalizing on the additional information obtained using these technologies. We therefore designed IterCluster, a novel, alignment-free clustering algorithm that can cluster barcodes from the same target region of a genome, using -mer frequency-based features and a Markov Cluster (MCL) approach to identify enough reads in a target region of a genome to ensure sufficient target genome sequence depth. The IterCluster method was validated using BGI stLFR and 10X Genomics chromium reads datasets. IterCluster had a higher precision and recall rate on BGI stLFR data compared to 10X Genomics Chromium read data. In addition, we demonstrated how IterCluster improves the de novo assembly results when using a divide-and-conquer strategy on a human genome data set (scaffold/contig N50 = 13.2 kbp/7.1 kbp vs. 17.1 kbp/11.9 kbp before and after IterCluster, respectively). IterCluster provides a new way for determining LFR barcode enrichment and a novel approach for de novo assembly using LFR data. IterCluster is OpenSource and available on https://github.com/JianCong-WENG/IterCluster.

Download Full-text

The complete mitochondrial genome sequence of the world's largest fish, the whale shark (Rhincodon typus), and its comparison with those of related shark species

Gene ◽

10.1016/j.gene.2014.01.064 ◽

2014 ◽

Vol 539 (1) ◽

pp. 44-49 ◽

Cited By ~ 16

Author(s):

Md Tauqeer Alam ◽

Robert A. Petit ◽

Timothy D. Read ◽

Alistair D.M. Dove

Keyword(s):

Mitochondrial Genome ◽

Genome Sequence ◽

Complete Mitochondrial Genome ◽

Shark Species ◽

Whale Shark ◽

Mitochondrial Genome Sequence ◽

Rhincodon Typus ◽

Complete Mitochondrial Genome Sequence

Download Full-text

An Unbiased Predictive Model to Detect DNA Methylation Propensity of CpG Islands in the Human Genome

Current Bioinformatics ◽

10.2174/1574893615999200724145835 ◽

2020 ◽

Vol 15 ◽

Author(s):

Dicle Yalcin ◽

Hasan H. Otu

Keyword(s):

Model Building ◽

De Novo ◽

Cpg Islands ◽

Treatment Strategies ◽

Area Under The Curve ◽

Global Methylation ◽

Sequence Features ◽

A Genome ◽

Combined Features ◽

Epigenetic Repression

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.

Download Full-text

Ever-increasing viral diversity associated with the red imported fire ant Solenopsis invicta (Formicidae: Hymenoptera)

Virology Journal ◽

10.1186/s12985-020-01469-w ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

César Augusto Diniz Xavier ◽

Margaret Louise Allen ◽

Anna Elizabeth Whitfield

Keyword(s):

Solenopsis Invicta ◽

De Novo ◽

Rna Virus ◽

Housekeeping Genes ◽

Virus Genome ◽

Single Strand ◽

Viral Diversity ◽

Red Imported Fire Ant ◽

Sequence Comparisons ◽

Data Set

Abstract Background Advances in sequencing and analysis tools have facilitated discovery of many new viruses from invertebrates, including ants. Solenopsis invicta is an invasive ant that has quickly spread worldwide causing significant ecological and economic impacts. Its virome has begun to be characterized pertaining to potential use of viruses as natural enemies. Although the S. invicta virome is the best characterized among ants, most studies have been performed in its native range, with less information from invaded areas. Methods Using a metatranscriptome approach, we further identified and molecularly characterized virus sequences associated with S. invicta, in two introduced areas, U.S and Taiwan. The data set used here was obtained from different stages (larvae, pupa, and adults) of S. invicta life cycle. Publicly available RNA sequences from GenBank’s Sequence Read Archive were downloaded and de novo assembled using CLC Genomics Workbench 20.0.1. Contigs were compared against the non-redundant protein sequences and those showing similarity to viral sequences were further analyzed. Results We characterized five putative new viruses associated with S. invicta transcriptomes. Sequence comparisons revealed extensive divergence across ORFs and genomic regions with most of them sharing less than 40% amino acid identity with those closest homologous sequences previously characterized. The first negative-sense single-stranded RNA virus genomic sequences included in the orders Bunyavirales and Mononegavirales are reported. In addition, two positive single-strand virus genome sequences and one single strand DNA virus genome sequence were also identified. While the presence of a putative tenuivirus associated with S. invicta was previously suggested to be a contamination, here we characterized and present strong evidence that Solenopsis invicta virus 14 (SINV-14) is a tenui-like virus that has a long-term association with the ant. Furthermore, based on virus sequence abundance compared to housekeeping genes, phylogenetic relationships, and completeness of viral coding sequences, our results suggest that four of five virus sequences reported, those being SINV-14, SINV-15, SINV-16 and SINV-17, may be associated to viruses actively replicating in the ant S. invicta. Conclusions The present study expands our knowledge about viral diversity associated with S. invicta in introduced areas with potential to be used as biological control agents, which will require further biological characterization.

Download Full-text

Genomic Characterization Provides an Insight into the Pathogenicity of the Poplar Canker Bacterium Lonsdalea populi

Genes ◽

10.3390/genes12020246 ◽

2021 ◽

Vol 12 (2) ◽

pp. 246

Author(s):

Xiaomeng Chen ◽

Rui Li ◽

Yonglin Wang ◽

Aining Li

Keyword(s):

Genome Sequence ◽

Extracellular Enzymes ◽

De Novo ◽

Whole Genome Sequence ◽

Hybrid Poplars ◽

A Genome ◽

Conserved Genes ◽

Genomic Characterization ◽

Molecular Bases ◽

Insight Into

An emerging poplar canker caused by the gram-negative bacterium, Lonsdalea populi, has led to high mortality of hybrid poplars Populus × euramericana in China and Europe. The molecular bases of pathogenicity and bark adaptation of L. populi have become a focus of recent research. This study revealed the whole genome sequence and identified putative virulence factors of L. populi. A high-quality L. populi genome sequence was assembled de novo, with a genome size of 3,859,707 bp, containing approximately 3434 genes and 107 RNAs (75 tRNA, 22 rRNA, and 10 ncRNA). The L. populi genome contained 380 virulence-associated genes, mainly encoding for adhesion, extracellular enzymes, secretory systems, and two-component transduction systems. The genome had 110 carbohydrate-active enzyme (CAZy)-coding genes and putative secreted proteins. The antibiotic-resistance database annotation listed that L. populi was resistant to penicillin, fluoroquinolone, and kasugamycin. Analysis of comparative genomics found that L. populi exhibited the highest homology with the L. britannica genome and L. populi encompassed 1905 specific genes, 1769 dispensable genes, and 1381 conserved genes, suggesting high evolutionary diversity and genomic plasticity. Moreover, the pan genome analysis revealed that the N-5-1 genome is an open genome. These findings provide important resources for understanding the molecular basis of the pathogenicity and biology of L. populi and the poplar-bacterium interaction.

Download Full-text

riboCIRC: a comprehensive database of translatable circRNAs

Genome Biology ◽

10.1186/s13059-021-02300-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Huihui Li ◽

Mingzhe Xie ◽

Yan Wang ◽

Ludong Yang ◽

Zhi Xie ◽

...

Keyword(s):

De Novo ◽

Species Conservation ◽

Structure And Function ◽

Research Community ◽

Genome Browser ◽

Valuable Resource ◽

Sequence Structure ◽

A Genome ◽

Context Specific ◽

And Function

AbstractriboCIRC is a translatome data-oriented circRNA database specifically designed for hosting, exploring, analyzing, and visualizing translatable circRNAs from multi-species. The database provides a comprehensive repository of computationally predicted ribosome-associated circRNAs; a manually curated collection of experimentally verified translated circRNAs; an evaluation of cross-species conservation of translatable circRNAs; a systematic de novo annotation of putative circRNA-encoded peptides, including sequence, structure, and function; and a genome browser to visualize the context-specific occupant footprints of circRNAs. It represents a valuable resource for the circRNA research community and is publicly available at http://www.ribocirc.com.

Download Full-text

Eyes of Africa: The Genetics of Blindness: Study Design and Methodology

BMC Ophthalmology ◽

10.1186/s12886-021-02029-8 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Olusola Olawoye ◽

Chimdi Chuka-Okosa ◽

Onoja Akpa ◽

Tony Realini ◽

Michael Hauser ◽

...

Keyword(s):

Genome Wide Association Study ◽

Case Control Study ◽

African Ancestry ◽

Collaborative Study ◽

Data Set ◽

Human Heredity ◽

Genome Wide ◽

A Genome ◽

Multiplex Families ◽

Control Study

Abstract Background This report describes the design and methodology of the “Eyes of Africa: The Genetics of Blindness,” a collaborative study funded through the Human Heredity and Health in Africa (H3Africa) program of the National Institute of Health. Methods This is a case control study that is collecting a large well phenotyped data set among glaucoma patients and controls for a genome wide association study. (GWAS). Multiplex families segregating Mendelian forms of early-onset glaucoma will also be collected for exome sequencing. Discussion A total of 4500 cases/controls have been recruited into the study at the end of the 3rd funded year of the study. All these participants have been appropriately phenotyped and blood samples have been received from these participants. Recent GWAS of POAG in African individuals demonstrated genome-wide significant association with the APBB2 locus which is an association that is unique to individuals of African ancestry. This study will add to the existing knowledge and understanding of POAG in the African population.

Download Full-text

Global production networks and the evolution of industrial capabilities: does production sharing warp the product space?

Oxford Economic Papers ◽

10.1093/oep/gpaa007 ◽

2020 ◽

Vol 72 (3) ◽

pp. 731-747

Author(s):

Russell Thomson ◽

Prema-Chandra Athukorala

Keyword(s):

Product Space ◽

Industrial Structure ◽

De Novo ◽

Trade Openness ◽

Production Networks ◽

Global Production Networks ◽

Industrial Upgrading ◽

Data Set ◽

Production Sharing ◽

Space Approach

Abstract Do production capabilities of countries evolve from existing capabilities or emerge de novo? The Product Space approach developed by Hidalgo, Klinger, Barabási and Hausmann postulates that a country’s existing industrial structure largely determines its opportunities for industrial upgrading. However, this is difficult to reconcile with the export dynamism of many developing countries such as Thailand, Malaysia, Costa Rica and Vietnam that transformed from primary commodity dependence to exporters of dynamic manufactured products. In each of these cases, global production sharing facilitated industrial transition. In this article, we advance the Product Space approach to accommodate the role of global production sharing. Using a newly constructed multi-country data set of manufacturing exports that distinguishes between trade within global production networks and traditional horizontal trade, we find that that existing industrial structure has a smaller impact, but trade openness has a greater impact, on industrial upgrading within vertically integrated global industries.

Download Full-text

De Novo Genome Assembly of Limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae): The First Reference Genome of a Deep-Sea Gastropod Endemic to Cold Seeps

Genome Biology and Evolution ◽

10.1093/gbe/evaa100 ◽

2020 ◽

Vol 12 (6) ◽

pp. 905-910 ◽

Cited By ~ 2

Author(s):

Ruoyu Liu ◽

Kun Wang ◽

Jun Liu ◽

Wenjie Xu ◽

Yang Zhou ◽

...

Keyword(s):

Deep Sea ◽

Metal Ion ◽

De Novo ◽

Demographic History ◽

Gene Families ◽

Phylogenetic Position ◽

Cold Seeps ◽

Nitrogen And Phosphorus ◽

De Novo Genome Assembly ◽

A Genome

Abstract Cold seeps, characterized by the methane, hydrogen sulfide, and other hydrocarbon chemicals, foster one of the most widespread chemosynthetic ecosystems in deep sea that are densely populated by specialized benthos. However, scarce genomic resources severely limit our knowledge about the origin and adaptation of life in this unique ecosystem. Here, we present a genome of a deep-sea limpet Bathyacmaea lactea, a common species associated with the dominant mussel beds in cold seeps. We yielded 54.6 gigabases (Gb) of Nanopore reads and 77.9-Gb BGI-seq raw reads, respectively. Assembly harvested a 754.3-Mb genome for B. lactea, with 3,720 contigs and a contig N50 of 1.57 Mb, covering 94.3% of metazoan Benchmarking Universal Single-Copy Orthologs. In total, 23,574 protein-coding genes and 463.4 Mb of repetitive elements were identified. We analyzed the phylogenetic position, substitution rate, demographic history, and TE activity of B. lactea. We also identified 80 expanded gene families and 87 rapidly evolving Gene Ontology categories in the B. lactea genome. Many of these genes were associated with heterocyclic compound metabolism, membrane-bounded organelle, metal ion binding, and nitrogen and phosphorus metabolism. The high-quality assembly and in-depth characterization suggest the B. lactea genome will serve as an essential resource for understanding the origin and adaptation of life in the cold seeps.

Download Full-text