A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis (Preprint)

Mapping Intimacies ◽

10.2196/preprints.19170 ◽

2020 ◽

Author(s):

Carla Mavian ◽

Simone Marini ◽

Mattia Prosperi ◽

Marco Salemi

Keyword(s):

Sequence Data ◽

Sampling Bias ◽

Systematic Investigation ◽

Sufficient Information ◽

Data Sets ◽

Sequence Alignments ◽

Phylogenetic Information ◽

Genome Data ◽

Country Specific

BACKGROUND The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. OBJECTIVE The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. METHODS We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. RESULTS Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. CONCLUSIONS At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.

Download Full-text

Correction: A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis (Preprint)

10.2196/preprints.22853 ◽

2020 ◽

Author(s):

Carla Mavian ◽

Simone Marini ◽

Mattia Prosperi ◽

Marco Salemi

Keyword(s):

Sequence Data ◽

Sampling Bias ◽

Systematic Investigation ◽

Sufficient Information ◽

Data Sets ◽

Sequence Alignments ◽

Phylogenetic Information ◽

Genome Data ◽

Country Specific

UNSTRUCTURED The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.

Download Full-text

A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis

JMIR Public Health and Surveillance ◽

10.2196/19170 ◽

2020 ◽

Vol 6 (2) ◽

pp. e19170 ◽

Cited By ~ 14

Author(s):

Carla Mavian ◽

Simone Marini ◽

Mattia Prosperi ◽

Marco Salemi

Keyword(s):

Sequence Data ◽

Sampling Bias ◽

Systematic Investigation ◽

Sufficient Information ◽

Data Sets ◽

Sequence Alignments ◽

Phylogenetic Information ◽

Genome Data ◽

Country Specific

Background The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. Objective The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. Methods We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Results Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. Conclusions At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic.

Download Full-text

A snapshot of SARS-CoV-2 genome availability up to 30th March, 2020 and its implications

10.1101/2020.04.01.020594 ◽

2020 ◽

Cited By ~ 6

Author(s):

Carla Mavian ◽

Simone Marini ◽

Mattia Prosperi ◽

Marco Salemi

Keyword(s):

Sufficient Information ◽

Scientific Journals ◽

Full Genome ◽

Genome Sequences ◽

Phylogenetic Information ◽

Specific Data ◽

Full Dataset ◽

Viral Sequences ◽

Country Specific

AbstractThe SARS-CoV-2 pandemic has been growing exponentially, affecting nearly 900 thousand people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published, in scientific journals as well as through non-peer reviewed channels, to investigate SARS-CoV-2 genetic heterogeneity and spatiotemporal dissemination. We examined full genome sequences currently available to assess the presence of sufficient information for reliable phylogenetic and phylogeographic studies in countries with the highest toll of confirmed cases. Although number of-available full-genomes is growing daily, and the full dataset contains sufficient phylogenetic information that would allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 datasets still present severe limitations. Studies assessing within country spread or transmission clusters should be considered preliminary at best, or hypothesis generating. Hence the need for continuing concerted efforts to increase number and quality of the sequences required for robust tracing of the epidemic.Significance StatementAlthough genome sequences of SARS-CoV-2 are growing daily and contain sufficient phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution.

Download Full-text

Molecular characterisation of theChlamydia pecorumplasmid from porcine, ovine, bovine, and koala strains indicates plasmid-strain co-evolution

PeerJ ◽

10.7717/peerj.1661 ◽

2016 ◽

Vol 4 ◽

pp. e1661 ◽

Cited By ~ 5

Author(s):

Martina Jelocnik ◽

Nathan L. Bachmann ◽

Helena Seth-Smith ◽

Nicholas R. Thomson ◽

Peter Timms ◽

...

Keyword(s):

Amplicon Sequencing ◽

Comparative Genomic ◽

Data Sets ◽

Nucleotide Polymorphisms ◽

Sequence Alignments ◽

Genome Data ◽

Chlamydia Pecorum ◽

Genomic Study ◽

Intergenic Regions ◽

Comparative Genomic Study

Background.Highly stable, evolutionarily conserved, small, non-integrative plasmids are commonly found in members of theChlamydiaceaeand, in some species, these plasmids have been strongly linked to virulence. To date, evidence for such a plasmid inChlamydia pecorumhas been ambiguous. In a recent comparative genomic study of porcine, ovine, bovine, and koalaC. pecorumisolates, we identified plasmids (pCpec) in a pig and three koala strains, respectively. Screening of further porcine, ovine, bovine, and koalaC. pecorumisolates for pCpec showed that pCpecis common, but not ubiquitous inC. pecorumfrom all of the infected hosts.Methods.We used a combination of (i) bioinformatic mining of previously sequencedC. pecorumgenome data sets and (ii) pCpec PCR-amplicon sequencing to characterise a further 17 novel pCpecs inC. pecorumisolates obtained from livestock, including pigs, sheep, and cattle, as well as those from koala.Results and Discussion.This analysis revealed that pCpec is conserved with all eight coding domain sequences (CDSs) present in isolates from each of the hosts studied. Sequence alignments revealed that the 21 pCpecs show 99% nucleotide sequence identity, with 83 single nucleotide polymorphisms (SNPs) shown to differentiate all of the plasmids analysed in this study. SNPs were found to be mostly synonymous and were distributed evenly across all eight pCpecCDSs as well as in the intergenic regions. Although conserved, analyses of the 21 pCpecsequences resolved plasmids into 12 distinct genotypes, with five shared between pCpecs from different isolates, and the remaining seven genotypes being unique to a single pCpec. Phylogenetic analysis revealed congruency and co-evolution of pCpecs with their cognate chromosome, further supporting polyphyletic origin of the koalaC. pecorum. This study provides further understanding of the complex epidemiology of this pathogen in livestock and koala hosts and paves the way for studies to evaluate the function of this putativeC. pecorumvirulence factor.

Download Full-text

Capturing heterotachy through multi-gamma site models

10.1101/018101 ◽

2015 ◽

Cited By ~ 3

Author(s):

Remco Bouckaert ◽

Peter Lockhart

Keyword(s):

Sequence Data ◽

Computational Cost ◽

Realistic Model ◽

Rate Variation ◽

Specific Rate ◽

Data Sets ◽

Sequence Evolution ◽

Rate Heterogeneity ◽

Real World Data ◽

Sequence Alignments

Most methods for performing a phylogenetic analysis based on sequence alignments of gene data assume that the mechanism of evolution is constant through time. It is recognised that some sites do evolve somewhat faster than others, and this can be captured using a (gamma) rate heterogeneity model. Further, some species have shorter replication times than others, and this results in faster rates of substitution in some lineages. This feature of lineage specific rate variation can be captured to some extent, by using relaxed clock models. However, it is also clear that there are additional poorly characterised features of sequence data that can sometimes lead to extreme differences in lineage specific rates. This variation is poorly captured by constant time reversible substitution models. The significance of extreme lineage specific rate differences is that they lead both to errors in reconstructing evolutionary relationships as well as biased estimates for the age of ancestral nodes. We propose a new model that allows gamma rate heterogeneity to change on branches, thus offering a more realistic model of sequence evolution. It adds negligible computational cost to likelihood calculations. We illustrate its effectiveness with an example of green algae and land-plants. For many real world data sets, we find a much better fit with multi-gamma sites models as well as substantial differences in ancestral node date estimates.

Download Full-text

An Overview of Multiple Sequence Alignments and Cloud Computing in Bioinformatics

ISRN Biomathematics ◽

10.1155/2013/615630 ◽

2013 ◽

Vol 2013 ◽

pp. 1-14 ◽

Cited By ~ 28

Author(s):

Jurate Daugelaite ◽

Aisling O' Driscoll ◽

Roy D. Sleator

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Sequence Data ◽

Cloud Base ◽

Data Sets ◽

Next Generation ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Computing Technologies

Multiple sequence alignment (MSA) of DNA, RNA, and protein sequences is one of the most essential techniques in the fields of molecular biology, computational biology, and bioinformatics. Next-generation sequencing technologies are changing the biology landscape, flooding the databases with massive amounts of raw sequence data. MSA of ever-increasing sequence data sets is becoming a significant bottleneck. In order to realise the promise of MSA for large-scale sequence data sets, it is necessary for existing MSA algorithms to be run in a parallelised fashion with the sequence data distributed over a computing cluster or server farm. Combining MSA algorithms with cloud computing technologies is therefore likely to improve the speed, quality, and capability for MSA to handle large numbers of sequences. In this review, multiple sequence alignments are discussed, with a specific focus on the ClustalW and Clustal Omega algorithms. Cloud computing technologies and concepts are outlined, and the next generation of cloud base MSA algorithms is introduced.

Download Full-text

Phylogenetic analysis of Tylenchida Thorne, 1949 as inferred from D2 and D3 expansion fragments of the 28S rRNA gene sequences

Nematology ◽

10.1163/156854106778493420 ◽

2006 ◽

Vol 8 (3) ◽

pp. 455-474 ◽

Cited By ~ 143

Author(s):

Sergei A. Subbotin ◽

Dieter Sturhan ◽

Vladimir N. Chizhov ◽

Nicola Vovlas ◽

James G. Baldwin

Keyword(s):

Ribosomal Rna ◽

Sequence Data ◽

Molecular Data ◽

Rrna Gene ◽

Data Sets ◽

28S Rrna ◽

Sequence Alignments ◽

Entomoparasitic Nematodes ◽

Rna Genes ◽

Basal Position

Abstract The evolutionary relationships of 82 species of tylenchid and aphelenchid nematodes were evaluated by use of sequence data of the D2 and D3 expansion fragments of the 28S ribosomal RNA genes. Nine automatic and one culled sequence alignments were analysed using maximum parsimony and Bayesian inference approaches. The molecular data sets showed that the order Tylenchida comprises lineages that largely correspond to two suborders, Hoplolaimina and Criconematina, and other taxonomic divisions as proposed by Siddiqi (2000). Several significant results also derived from our study include: i) the basal position of groups that include entomoparasitic nematodes within tylenchid trees; ii) paraphyly of the superfamily Dolichodoroidea sensu Siddiqi (2000); iii) evidence for a Pratylenchus, Hirschmanniella and Meloidogyne clade; and iv) lack of support for widely held traditional placement of Radopholus within Pratylenchidae and placement of this genus within Hoplolaimidae or Heteroderidae. Congruence and incongruence of molecular phylogeny and traditional classifications and morphological-based hypotheses of phylogeny of tylenchids are discussed.

Download Full-text

Trust in the Police: Cross-country Comparisons

Voprosy Ekonomiki ◽

10.32609/0042-8736-2012-11-24-47 ◽

2012 ◽

pp. 24-47

Author(s):

V. Gimpelson ◽

G. Monusova

Keyword(s):

Public Opinion ◽

Public Attitudes ◽

Crime Rates ◽

Authoritarian Regimes ◽

Data Sets ◽

The Public ◽

Positive Attitudes ◽

Cross Country ◽

Police Activity

Using different cross-country data sets and simple econometric techniques we study public attitudes towards the police. More positive attitudes are more likely to emerge in the countries that have better functioning democratic institutions, less prone to corruption but enjoy more transparent and accountable police activity. This has a stronger impact on the public opinion (trust and attitudes) than objective crime rates or density of policemen. Citizens tend to trust more in those (policemen) with whom they share common values and can have some control over. The latter is a function of democracy. In authoritarian countries — “police states” — this tendency may not work directly. When we move from semi-authoritarian countries to openly authoritarian ones the trust in the police measured by surveys can also rise. As a result, the trust appears to be U-shaped along the quality of government axis. This phenomenon can be explained with two simple facts. First, publicly spread information concerning police activity in authoritarian countries is strongly controlled; second, the police itself is better controlled by authoritarian regimes which are afraid of dangerous (for them) erosion of this institution.

Download Full-text

mtDNAcombine: tools to combine sequences from multiple studies

BMC Bioinformatics ◽

10.1186/s12859-021-04048-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Eleanor F. Miller ◽

Andrea Manica

Keyword(s):

Sequence Data ◽

Data Extraction ◽

Bayesian Skyline Plot ◽

Model Organisms ◽

Data Sets ◽

Data Handling ◽

Online Database ◽

Genetic Studies ◽

Wide Range ◽

Existing Data

Abstract Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.

Download Full-text

MoRoNet a network to strengthen the quality of measles and rubella surveillance in Italy

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.1336 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

F Magurano ◽

M Baggieri ◽

P Bucci ◽

E D'Ugo ◽

M Sabbatucci ◽

...

Keyword(s):

Disease Surveillance ◽

Specific Work ◽

Vaccine Preventable Disease ◽

Work Plan ◽

Preventable Disease ◽

National Reference ◽

Disease Elimination ◽

National Networks ◽

Country Specific

Abstract Background Measles is a vaccine-preventable infectious disease and it remains one of the leading causes of infant mortality globally. The World Health Organization (WHO) has adopted the goal of eliminating measles and rubella. Detection and control of communicable diseases would not be possible without accurate laboratory results regarding when and where a particular disease circulates. Methods WHO/Europe therefore works with all Member States to steadily improve the quality of the laboratory data in order to determine the Region's progress towards measles and rubella elimination. For this purpose coordinates the European Measles and Rubella Laboratory Network (MR LabNet). National labs in this network undergoes regular external quality assessment through an annual accreditation programme. Results In Italy, a Sub-national Reference Laboratories Network for measles and rubella (MoRoNET) has been developed since March 2017 and currently includes 15 laboratories. MoRoNet was developed following the indications of the MR LabNet. It is accreditate, coordinated and supervised by the National Reference Laboratory. Conclusions Strengthening the role of national laboratories in overseeing the performance of subnational laboratories has become a critical need in order to properly monitor the Region's measles and rubella elimination efforts. MoRoNet permits to Italy to develop a country-specific work plan for establishing national networks and oversight mechanism, including preliminary monitoring and evaluation indicators compliant with MR LabNet standards. This is very significant not only to optimize the participation in national and regional processes to verify disease elimination, but also to strengthen the quality of vaccine-preventable disease surveillance. MoRoNet Group: A Amendola; F Baldanti; MR Capobianchi; M Chironna; MG Cusi; P D'Agaro; P Lanzafame; T Lazzarotto; K Marinelli; A Orsi; E Pagani; G Palù; F Pittaluga, A Sacchi; F Tramuto. Key messages MoRoNet has permitted to Italy to develop a country-specific work plan for establishing national networks and oversight mechanism, compliant with WHO MR LabNet standards. MoRoNet network has permitted to optimize the participation in processes to verify disease elimination, but also to strengthen the quality of vaccine-preventable disease surveillance.

Download Full-text