Phylogenetic tree shapes resolve disease transmission patterns

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Bayesian inference of infectious disease transmission from whole genome sequence data

10.1101/001388 ◽

2013 ◽

Cited By ~ 1

Author(s):

Xavier Didelot ◽

Jennifer Gardy ◽

Caroline Colijn

Keyword(s):

Disease Transmission ◽

Sequence Data ◽

Disease Outbreaks ◽

Genomic Data ◽

Realistic Model ◽

Host Population ◽

Whole Genome Sequence ◽

Genomic Epidemiology ◽

Starting Point ◽

Source Case

Genomics is increasingly being used to investigate disease outbreaks, but an important question remains unanswered -- how well do genomic data capture known transmission events, particularly for pathogens with long carriage periods or large within-host population sizes? Here we present a novel Bayesian approach to reconstruct densely-sampled outbreaks from genomic data whilst considering within-host diversity. We infer a time-labelled phylogeny using BEAST, then infer a transmission network via a Monte-Carlo Markov Chain. We find that under a realistic model of within-host evolution, reconstructions of simulated outbreaks contain substantial uncertainty even when genomic data reflect a high substitution rate. Reconstruction of a real-world tuberculosis outbreak displayed similar uncertainty, although the correct source case and several clusters of epidemiologically linked cases were identified. We conclude that genomics cannot wholly replace traditional epidemiology, but that Bayesian reconstructions derived from sequence data may form a useful starting point for a genomic epidemiology investigation.

Download Full-text

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

10.7287/peerj.preprints.1106v1 ◽

2015 ◽

Author(s):

Jennifer Fouquier ◽

Jai R Rideout ◽

Evan Bolyen ◽

John H Chase ◽

Arron Shiffer ◽

...

Keyword(s):

Phylogenetic Tree ◽

Genetic Marker ◽

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Sequence Data ◽

Fungal Species ◽

Bioinformatics Tool ◽

Hybrid Gene ◽

Fungal Database ◽

Taxonomic Groups

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

Download Full-text

Analysis of The Nucleotide Sequence Diversity of the Lassa Virus and Augmenting its Phylogenetic Tree

STEM Fellowship Journal ◽

10.17975/sfj-2018-005 ◽

2018 ◽

Vol 4 (1) ◽

pp. 21-26

Author(s):

Sean Oddoye

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Nucleotide Diversity ◽

Sequence Data ◽

Sequence Diversity ◽

Future Research ◽

P Value ◽

Hemorrhagic Disease ◽

Lassa Virus ◽

Glycoprotein Precursor

Lassa Virus (LASV) is the etiological catalyst for Lassa fever, an acute hemorrhagic disease with a mortality rate of 15%. Many aspects of the Lassa virus are not understood, like the causation of deafness in ⅓ of surviving patients or why symptoms are benign for 80% of those infected with the virus. Ambiguities like these suggest that there might exist some genomic heterogeneity among infecting viruses and demonstrate a need to quantify and analyze polymorphisms within LASV. Patterns that emerge from phylogenetic trees can be used to assess the structure of a population while also providing insights to the genetic makeup. The purpose of this investigation was to develop a more streamlined means of calculating nucleotide diversity within a subpopulation of Lassa virus strains and to augment a phylogenetic tree of the Lassa Virus glycoprotein precursor (GPC) segment. A total of 25 partial and complete data sequences of LASV strains were obtained from the Genbank Archives. During phase one of this investigation, the sequence data was inputted into MEGA analytical software and the sequence diversity was derived on a nucleotide level. Data from the individual strand sequences was used to augment a phylogenetic tree using Treeview X software. In phase two of this investigation, an algorithm was created using RStudio, with BSGenome and BioStrings extensions. The sequence diversity derived from the statistical analyses on MEGA was compared to that of the algorithm created. A p-value of 0.08 was found, which deviates from the accepted range of non-medical p-value of 0.00 to 0.05. It is suggested that future research focuses on creating a refurbished version of the algorithm to calculate a nucleotide diversity within a percent error of 5%.

Download Full-text

Compressing Streams of Phylogenetic Trees

10.1101/440644 ◽

2018 ◽

Author(s):

Axel Trefzer ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Sequence Data ◽

Branch Length ◽

Distinct Species ◽

Mcmc Methods ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Posterior Probability Distribution ◽

Tree Compression

AbstractBayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.

Download Full-text

A Mathematical Model For Lassa Fever Transmission Dynamics With Impacts of Control Measures: Analysis And Simulation

European Journal of Mathematics and Statistics ◽

10.24018/ejmath.2021.2.2.17 ◽

2021 ◽

Vol 2 (2) ◽

pp. 19-28

Author(s):

Oke Isaiah Idisi ◽

Tunde Tajudeen Yusuf

Keyword(s):

Disease Transmission ◽

Control Strategies ◽

Disease Outbreaks ◽

Lassa Fever ◽

Control Measures ◽

Transmission Dynamics ◽

Effective Control ◽

Lassa Virus ◽

Long Run ◽

Globally Stable

Lassa Fever, caused by Lassa virus, is a vector-host transmitted infectious disease whose prevalence has been on the upsurge over the past few decades. Thus, considering the grave implications of the continuous spread of the disease, an epidemic model was developed to describe the disease transmission dynamics with impacts of proposed control measures. This is to help inform effective control strategies that would successfully curtail and contain the disease in its endemic areas. The model is qualitatively analyzed in order to contextualize the long run behavior of the model while the model associated basic reproduction number $(\mathcal{R}_0)$ is derived. The model analysis reveals that the disease-free equilibrium is locally and globally stable whenever $ \mathcal{R}_0 < 1 $ and the disease prevalence would be high as long as $ \mathcal{R}_0 > 1 $. Finally, the model is numerically solved and simulated for different scenarios of the disease outbreaks while the findings from simulations are discussed.

Download Full-text

Influenza outbreak in a Canadian correctional facility

Journal of Infection Prevention ◽

10.1177/1757177416689725 ◽

2017 ◽

Vol 18 (4) ◽

pp. 193-198 ◽

Cited By ~ 6

Author(s):

Jonathan Besney ◽

Danusia Moreau ◽

Angela Jacobs ◽

Dan Woods ◽

Diane Pyne ◽

...

Keyword(s):

Disease Transmission ◽

Communicable Disease ◽

Correctional Facility ◽

Disease Outbreaks ◽

Control Measures ◽

Infection Prevention And Control ◽

Correctional Facilities ◽

Influenza Outbreak ◽

Increased Risk ◽

And Control

Correctional facilities face increased risk of communicable disease transmission and outbreaks. We describe the progression of an influenza outbreak in a Canadian remand facility and suggest strategies for preventing, identifying and responding to outbreaks in this setting. In total, six inmates had laboratory-confirmed influenza resulting in 144 exposed contacts. Control measures included enhanced isolation precautions, restricting admissions to affected living units, targeted vaccination and antiviral prophylaxis. This report highlights the importance of setting specific outbreak guidelines in addressing population and environmental challenges, as well as implementation of effective infection prevention and control (IPAC) and public health measures when managing influenza and other communicable disease outbreaks.

Download Full-text

SeqDistK: a Novel Tool for Alignment-free Phylogenetic Analysis

10.1101/2021.08.16.456500 ◽

2021 ◽

Author(s):

Xuemei Liu ◽

Wen Li ◽

Guanda Huang ◽

Tianlai Huang ◽

Qingang Xiong ◽

...

Keyword(s):

Phylogenetic Analysis ◽

16S Rrna ◽

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Large Scale ◽

Sequence Data ◽

Ground Truth ◽

Group Method ◽

Metagenomic Sequence ◽

Alignment Free

Algorithms for constructing phylogenetic trees are fundamental to study the evolution of viruses, bacteria, and other microbes. Established multiple alignment-based algorithms are inefficient for large scale metagenomic sequence data because of their high requirement of inter-sequence correlation and high computational complexity. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK computes the dissimilarity matrix for phylogenetic analysis, incorporating seven k-mer based dissimilarity measures, namely d2, d2S, d2star, Euclidean, Manhattan, CVTree, and Chebyshev. Based on these dissimilarities, SeqDistK constructs phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden standard dataset of 16S rRNA and its associated phylogenetic tree, we compared SeqDistK to Muscle - a multi sequence aligner. We found SeqDistK was not only 38 times faster than Muscle in computational efficiency but also more accurate. SeqDistK achieved the smallest symmetric difference between the inferred and ground truth trees with a range between 13 to 18, while that of Muscle was 62. When measures d2, d2star, d2S, Euclidean, and k-mer size k=5 were used, SeqDistK consistently inferred phylogenetic tree almost identical to the ground truth tree. We also performed clustering of 16S rRNA sequences using SeqDistK and found the clustering was highly consistent with known biological taxonomy. Among all the measures, d2S (k=5, M=2) showed the best accuracy as it correctly clustered and classified all sample sequences. In summary, SeqDistK is a novel, fast and accurate alignment-free tool for large-scale phylogenetic analysis. SeqDistK software is freely available at https://github.com/htczero/SeqDistK.

Download Full-text

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

10.7287/peerj.preprints.1106 ◽

2015 ◽

Author(s):

Jennifer Fouquier ◽

Jai R Rideout ◽

Evan Bolyen ◽

John H Chase ◽

Arron Shiffer ◽

...

Keyword(s):

Phylogenetic Tree ◽

Genetic Marker ◽

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Sequence Data ◽

Fungal Species ◽

Bioinformatics Tool ◽

Hybrid Gene ◽

Fungal Database ◽

Taxonomic Groups

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

Download Full-text

Public Health Surveillance in a Large Evacuation Shelter Post Hurricane Harvey

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8955 ◽

2018 ◽

Vol 10 (1) ◽

Author(s):

Aisha Haynie ◽

Sherry Jin ◽

Leann Liu ◽

Sherrill Pirsamadi ◽

Benjamin Hornstein ◽

...

Keyword(s):

Public Health ◽

Mental Health ◽

Disease Control ◽

Disease Transmission ◽

Online Survey ◽

Communicable Disease ◽

Disease Outbreaks ◽

Control Measures ◽

Harris County ◽

Survey Tool

Objective1) Describe HCPH’s disease surveillance and prevention activities within the NRG Center mega-shelter; 2) Present surveillance findings with an emphasis on sharing tools that were developed and may be utilized for future disaster response efforts; 3) Discuss successes achieved, challenges encountered, and lessons learned from this emergency response.IntroductionHurricane Harvey made landfall along the Texas coast on August 25th, 2017 as a Category 4 storm. It is estimated that the ensuing rainfall caused record flooding of at least 18 inches in 70% of Harris County. Over 30,000 residents were displaced and 50 deaths occurred due to the devastation. At least 53 temporary refuge shelters opened in various parts of Harris County to accommodate displaced residents. On the evening of August 29th, Harris County and community partners set up a 10,000 bed mega-shelter at NRG Center, in efforts to centralize refuge efforts. Harris County Public Health (HCPH) was responsible for round-the-clock surveillance to monitor resident health status and prevent communicable disease outbreaks within the mega-shelter. This was accomplished through direct and indirect resident health assessments, along with coordinated prevention and disease control efforts. Despite HCPH’s 20-day active response, and identification of two relatively small but potentially worrisome communicable disease outbreaks, no large-scale disease outbreaks occurred within the NRG Center mega-shelter.MethodsActive surveillance was conducted in the NRG shelter to rapidly detect communicable and high-consequence illness and to prevent disease transmission. An online survey tool and novel epidemiology consulting method were developed to aid in this surveillance. Surveillance included daily review of onsite medical, mental health, pharmacy, and vaccination activities, as well as nightly cot-to-cot resident health surveys. Symptoms of infectious disease, exacerbation of chronic disease, and mental health issues among evacuees were closely monitored. Rapid epidemiology consultations were performed for shelter residents displaying symptoms consistent with communicable illness or other signs of distress during nightly cot surveys. Onsite rapid assay tests and public health laboratory testing were used to confirm disease diagnoses. When indicated, disease control measures were implemented and residents referred for further evaluation. Frequencies and percentages were used in the descriptive analysis.ResultsHarris County’s NRG Center mega-shelter housed 3,365 evacuees at its peak. 3,606 household health surveys were completed during 20 days of active surveillance, representing 7,152 individual resident evaluations, and 395 epidemiology consultations. Multifaceted surveillance uncovered influenza-like illness and gastrointestinal (GI) complaints, revealing an Influenza A outbreak of 20 cases, 3 isolated cases of strep throat, and a Norovirus cluster of 5 cases. Disease control activities included creation of respiratory and GI isolation rooms, provision of over 771 influenza vaccinations, generous distribution of hand sanitizer throughout the shelter, placement of hygiene signage, and frequent bilingual public health public service announcements in the dormitory areas. No widespread outbreaks of communicable disease occurred. Additionally, a number of shelter residents were referred to the clinic after reporting exacerbation of chronical conditions or mental health concerns, including one individual with suicidal ideations.ConclusionsEffective public health surveillance and implementation of disease control measures in disaster shelters are critical to detecting and preventing communicable illness. HCPH’s rigorous surveillance and response system in the NRG Center mega-shelter, including online survey tool and novel consultation method, resulted in timely identification and isolation of patients with gastrointestinal and influenza-like illness. These were likely key factors in the successful prevention of widespread disease transmission. Additional success factors included successful partnerships with onsite clinical and pharmacy teams, cooperative and engaged shelter leadership, synergistic internal surveillance team dynamics, availability of student volunteers, sufficient quantities of influenza vaccine, and access to mobile survey technology. Challenges, mostly related to scope and magnitude of response, included lack of pre-designed survey tools, relatively new staff without significant disaster experience, and simultaneous management of multiple surveillance activities within the community. Personal hurricane-related losses experienced by HCPH staff also impacted response efforts. HCPH’s rich disaster response experiences at the NRG mega-shelter and developed surveillance tools can serve as a planning guide for future public health emergencies in Harris County and other jurisdictions.

Download Full-text