Is Phylotranscriptomics as Reliable as Phylogenomics?

Seongmin Cheon; Jianzhi Zhang; Chungoo Park

doi:10.1093/molbev/msaa181

Is Phylotranscriptomics as Reliable as Phylogenomics?

Molecular Biology and Evolution ◽

10.1093/molbev/msaa181 ◽

2020 ◽

Vol 37 (12) ◽

pp. 3672-3683 ◽

Cited By ~ 3

Author(s):

Seongmin Cheon ◽

Jianzhi Zhang ◽

Chungoo Park

Keyword(s):

Genome Sequencing ◽

Dna Sequences ◽

Orthologous Gene ◽

Gene Identification ◽

Sequencing Data ◽

Genome Sequences ◽

Phylogenetic Information ◽

Tissue Of Origin ◽

Phylogenetic Method ◽

Rigorous Method

Abstract Phylogenomics, the study of phylogenetic relationships among taxa based on their genome sequences, has emerged as the preferred phylogenetic method because of the wealth of phylogenetic information contained in genome sequences. Genome sequencing, however, can be prohibitively expensive, especially for taxa with huge genomes and when many taxa need sequencing. Consequently, the less costly phylotranscriptomics has seen an increased use in recent years. Phylotranscriptomics reconstructs phylogenies using DNA sequences derived from transcriptomes, which are often orders of magnitude smaller than genomes. However, in the absence of corresponding genome sequences, comparative analyses of transcriptomes can be challenging and it is unclear whether phylotranscriptomics is as reliable as phylogenomics. Here, we respectively compare the phylogenomic and phylotranscriptomic trees of 22 mammals and 15 plants that have both sequenced nuclear genomes and publicly available RNA sequencing data from multiple tissues. We found that phylotranscriptomic analysis can be sensitive to orthologous gene identification. When a rigorous method for identifying orthologs is employed, phylogenomic and phylotranscriptomic trees are virtually identical to each other, regardless of the tissue of origin of the transcriptomes and whether the same tissue is used across species. These findings validate phylotranscriptomics, brighten its prospect, and illustrate the criticality of reliable ortholog detection in such practices.

Download Full-text

Draft Genome Sequence of a Hypermucoviscous Extended-Spectrum-β-Lactamase-Producing Klebsiella quasipneumoniae subsp. similipneumoniae Clinical Isolate

Genome Announcements ◽

10.1128/genomea.00475-16 ◽

2016 ◽

Vol 4 (4) ◽

Cited By ~ 12

Author(s):

U. Garza-Ramos ◽

J. Silva-Sánchez ◽

J. Catalán-Nájera ◽

H. Barrios ◽

N. Rodríguez-Medina ◽

...

Keyword(s):

Adult Patient ◽

Clinical Isolate ◽

Genome Sequencing ◽

Dna Sequences ◽

Urine Culture ◽

Draft Genome ◽

Trna Genes ◽

Whole Genome ◽

Genome Sequences ◽

Extended Spectrum

A clinical isolate of extended-spectrum-β-lactamase-producing Klebsiella quasipneumoniae subsp. similipneumoniae 06-219 with hypermucoviscosity phenotypes obtained from a urine culture of an adult patient was used for whole-genome sequencing. Here, we report the draft genome sequences of this strain, consisting of 53 contigs with an ~5.6-Mb genome size and an average G+C content of 57.36%. The annotation revealed 6,622 coding DNA sequences and 77 tRNA genes.

Download Full-text

First phylogenetic analysis of Malian SARS-CoV-2 sequences provide molecular insights into the genomic diversity of the Sahel region

10.1101/2020.09.23.20165639 ◽

2020 ◽

Author(s):

Bourema Kouriba ◽

Angela Duerr ◽

Alexandra Rehn ◽

Abdoul Karim Sangare ◽

Brehima Youssouf Traoure ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Genome Sequencing ◽

Sequence Data ◽

Genomic Diversity ◽

Whole Genome ◽

Sequencing Data ◽

Genome Sequences ◽

Spreading Dynamics ◽

Sahel Region ◽

Limited Sequence

We are currently facing a pandemic of COVID-19, caused by a spillover from an animal-originating coronavirus to humans occuring in the Wuhan region, China, in December 2019. From China the virus has spread to 188 countries and regions worldwide, reaching the Sahel region on the 2nd of March 2020. Since whole genome sequencing (WGS) data is very crucial to understand the spreading dynamics of the ongoing pandemic, but only limited sequence data is available from the Sahel region to date, we have focused our efforts on generating the first Malian sequencing data available. Screening of 217 Malian patient samples for the presence of SARS-CoV-2 resulted in 38 positive isolates from which 21 whole genome sequences were generated. Our analysis shows that both, the early A (19B) and the fast evolving B (20A/C) clade, are present in Mali indicating multiple and independent introductions of the SARS-CoV-2 to the Sahel region.

Download Full-text

Whole Genome Sequence Analysis of SARS-CoV-2 Strains Circulating in Malaysia During First Wave and Early Second Wave of Infections.

10.21203/rs.3.rs-81152/v1 ◽

2020 ◽

Author(s):

Zarina Mohd Zawawi ◽

Jeyanthi Suppiah ◽

Jeevanathan Kalyanasundram ◽

Muhammad Afif Azizan ◽

Shuhaila Mat-Sharani ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Whole Genome Sequence ◽

Health Concern ◽

Whole Genome ◽

Sequencing Analysis ◽

Full Genome ◽

Sequencing Data ◽

Genome Sequences ◽

Synonymous Mutations

Abstract Background: Since December 2019, the outbreak of COVID-19 has raised a great public health concern globally. Here, we report the whole genome sequencing analysis of SARS-CoV-2 strains in Malaysia isolated from six patients diagnosed with COVID-19.Methods: The SARS-CoV-2 viral RNA extracted from clinical specimens and isolates were subjected to whole genome sequencing using NextSeq 500 platform. The sequencing data were assembled to full genome sequences using Megahit and phylogenetic tree was constructed using Mega X software.Results: Six full genome sequences of SARS-CoV-2 comprising of strains from 1st wave (25th January 2020) and 2nd wave (27th February 2020) infection were obtained. Downstream analysis demonstrated diversity among the Malaysian strains with several synonymous and non-synonymous mutations in four of the six cases, affecting the genes M, orf1ab, and S of the SARS-CoV-2 virus. The phylogenetic analysis revealed viral genome sequences of Malaysian SARS-CoV-2 strains clustered under the ancestral Type B.Conclusion: This study comprehended the SARS-CoV-2 virus evolution during its circulation in Malaysia. Continuous monitoring and analysis of the whole genome sequences of confirmed cases would be crucial to further understand the genetic evolution of the virus.

Download Full-text

read_haps: using read haplotypes to detect same species contamination in DNA sequences

10.1101/2020.02.11.941773 ◽

2020 ◽

Author(s):

Hannes P. Eggertsson ◽

Bjarni V. Halldorsson

Keyword(s):

Data Analysis ◽

Genome Sequencing ◽

Dna Sequences ◽

Diploid Species ◽

Reliable Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Polymorphic Snps

AbstractMotivationData analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology.ResultsIn human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data.Availabilitygithub.com/DecodeGenetics/[email protected]

Download Full-text

First Phylogenetic Analysis of Malian SARS-CoV-2 Sequences Provides Molecular Insights into the Genomic Diversity of the Sahel Region

Viruses ◽

10.3390/v12111251 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1251

Author(s):

Bourema Kouriba ◽

Angela Dürr ◽

Alexandra Rehn ◽

Abdoul Karim Sangaré ◽

Brehima Y. Traoré ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genomic Diversity ◽

Whole Genome ◽

Sequencing Data ◽

Genome Sequences ◽

Spreading Dynamics ◽

Sahel Region

We are currently facing a pandemic of COVID-19, caused by a spillover from an animal-originating coronavirus to humans occurring in the Wuhan region of China in December 2019. From China, the virus has spread to 188 countries and regions worldwide, reaching the Sahel region on 2 March 2020. Since whole genome sequencing (WGS) data is very crucial to understand the spreading dynamics of the ongoing pandemic, but only limited sequencing data is available from the Sahel region to date, we have focused our efforts on generating the first Malian sequencing data available. Screening 217 Malian patient samples for the presence of SARS-CoV-2 resulted in 38 positive isolates, from which 21 whole genome sequences were generated. Our analysis shows that both the early A (19B) and the later observed B (20A/C) clade are present in Mali, indicating multiple and independent introductions of SARS-CoV-2 to the Sahel region.

Download Full-text

The Identification of the SARS-CoV-2 Whole Genome: Nine Cases Among Patients in Banten Province, Indonesia

Journal of Pure and Applied Microbiology ◽

10.22207/jpam.15.2.52 ◽

2021 ◽

Author(s):

Chris Adhiyanto ◽

Laifa A. Hendarmin ◽

Erike A. Suwarsono ◽

Zeti Harriyati ◽

Suryani ◽

...

Keyword(s):

Genome Sequencing ◽

Genetic Variants ◽

Viral Genome ◽

Genetic Background ◽

Respiratory Illness ◽

Whole Genome ◽

Sequencing Data ◽

Rt Pcr ◽

Genome Sequences ◽

Severe Patient

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the strain of virus that causes coronavirus disease 2019 (COVID-19), the respiratory illness responsible for the current pandemic. Viral genome sequencing has been widely applied during outbreaks to study the relatedness of this virus to other viruses, its transmission mode, pace, evolution and geographical spread, and also its adaptation to human hosts. To date, more than 90,000 SARS-CoV-2 genome sequences have been uploaded to the GISAID database. The availability of sequencing data along with clinical and geographical data may be useful for epidemiological investigations. In this study, we aimed to analyse the genetic background of SARS-CoV-2 from patients in Indonesia by whole genome sequencing. We examined nine samples from COVID-19 patients with RT-PCR cycle threshold (Ct) of less than 25 using ARTIC Network protocols for Oxford Nanopore’s Gridi On sequencer. The analytical methods were based on the ARTIC multiplex PCR sequencing protocol for COVID-19. In this study, we found that several genetic variants within the nine COVID-19 patient samples. We identified a mutation at position 614 P323L mutation in the ORF1ab gene often found in our severe patient samples. The number of SNPs and their location within the SARS-CoV-2 genome seems to vary. This diversity might be responsible for the virulence of the virus and its clinical manifestation.

Download Full-text

read_haps: using read haplotypes to detect same species contamination in DNA sequences

Bioinformatics ◽

10.1093/bioinformatics/btaa936 ◽

2020 ◽

Author(s):

Hannes P Eggertsson ◽

Bjarni V Halldorsson

Keyword(s):

Data Analysis ◽

Genome Sequencing ◽

Dna Sequences ◽

Diploid Species ◽

Reliable Data ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Short Read ◽

Polymorphic Snps

Abstract Motivation Data analysis is requisite on reliable data. In genetics this includes verifying that the sample is not contaminated with another, a problem ubiquitous in biology. Results In human, and other diploid species, DNA contamination from the same species can be found by the presence of three haplotypes between polymorphic SNPs. read_haps is a tool that detects sample contamination from short read whole genome sequencing data. Availabilityand implementation github.com/DecodeGenetics/read_haps. Contact [email protected]

Download Full-text

From whole genome sequencing data toward a simple genotyping tool: application to the animal pathogen Mycobacterium bovis

10.26226/morressier.56d5ba2ad462b80296c965c0 ◽

2016 ◽

Author(s):

Lorraine Michelet

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Mycobacterium Bovis ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text

High-precision and cost-efficient sequencing for real-time COVID-19 surveillance

Scientific Reports ◽

10.1038/s41598-021-93145-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sung Yong Park ◽

Gina Faraci ◽

Pamela M. Ward ◽

Jane F. Emerson ◽

Ha Youn Lee

Keyword(s):

Los Angeles ◽

Whole Genome Sequencing ◽

Real Time ◽

Genome Sequencing ◽

High Precision ◽

High Throughput Sequencing ◽

Whole Genome ◽

Sequencing Data ◽

Public Health Response ◽

Cost Efficient

AbstractCOVID-19 global cases have climbed to more than 33 million, with over a million total deaths, as of September, 2020. Real-time massive SARS-CoV-2 whole genome sequencing is key to tracking chains of transmission and estimating the origin of disease outbreaks. Yet no methods have simultaneously achieved high precision, simple workflow, and low cost. We developed a high-precision, cost-efficient SARS-CoV-2 whole genome sequencing platform for COVID-19 genomic surveillance, CorvGenSurv (Coronavirus Genomic Surveillance). CorvGenSurv directly amplified viral RNA from COVID-19 patients’ Nasopharyngeal/Oropharyngeal (NP/OP) swab specimens and sequenced the SARS-CoV-2 whole genome in three segments by long-read, high-throughput sequencing. Sequencing of the whole genome in three segments significantly reduced sequencing data waste, thereby preventing dropouts in genome coverage. We validated the precision of our pipeline by both control genomic RNA sequencing and Sanger sequencing. We produced near full-length whole genome sequences from individuals who were COVID-19 test positive during April to June 2020 in Los Angeles County, California, USA. These sequences were highly diverse in the G clade with nine novel amino acid mutations including NSP12-M755I and ORF8-V117F. With its readily adaptable design, CorvGenSurv grants wide access to genomic surveillance, permitting immediate public health response to sudden threats.

Download Full-text