Where did you come from, where did you go: Refining Metagenomic Analysis Tools for HGT characterisation

Mapping Intimacies ◽

10.1101/401349 ◽

2018 ◽

Author(s):

Enrico Seiler ◽

Kathrin Trappe ◽

Bernhard Y. Renard

Keyword(s):

Horizontal Gene Transfer ◽

Gene Tree ◽

Real Data ◽

Species Boundaries ◽

Species Trees ◽

Phylogenetic Species ◽

Read Mapping ◽

Donor Identification ◽

The Cost ◽

Candidate Identification

AbstractHorizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. Existing HGT detection approaches usually first identify genes of foreign nature, e.g., using composition-based methods, and then exploit phylogenetic discrepancies of the corresponding gene tree compared to a species tree. These approaches depend on fully sequenced HGT organisms and computable phylogenetic species trees. The tool Daisy offers a different approach based on read mapping that provides complementary evidence compared to existing methods at the cost of relying on the acceptor and donor references of the HGT organism being known. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods can not be directly applied. We propose DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor candidates of an HGT organism based on sequencing reads. To do that, DaisyGPS leverages metagenomic profiling strategies and refines them for HGT candidate identification. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of MRSA outbreak data. DaisyGPS is freely available from https://gitlab.com/rki_bioinformatics/.

Download Full-text

Reconstructing the Phylogeny of Corynebacteriales while Accounting for Horizontal Gene Transfer

Genome Biology and Evolution ◽

10.1093/gbe/evaa058 ◽

2020 ◽

Vol 12 (4) ◽

pp. 381-395

Author(s):

Nilson Da Rocha Coimbra ◽

Aristoteles Goes-Neto ◽

Vasco Azevedo ◽

Aïda Ouangraoua

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Phylogenetic Trees ◽

Bacterial Species ◽

Gene Tree ◽

Gene Families ◽

Common Mechanism ◽

Species Trees ◽

Bacterial Phylogeny ◽

Homologous Genes

Abstract Horizontal gene transfer is a common mechanism in Bacteria that has contributed to the genomic content of existing organisms. Traditional methods for estimating bacterial phylogeny, however, assume only vertical inheritance in the evolution of homologous genes, which may result in errors in the estimated phylogenies. We present a new method for estimating bacterial phylogeny that accounts for the presence of genes acquired by horizontal gene transfer between genomes. The method identifies and corrects putative transferred genes in gene families, before applying a gene tree-based summary method to estimate bacterial species trees. The method was applied to estimate the phylogeny of the order Corynebacteriales, which is the largest clade in the phylum Actinobacteria. We report a collection of 14 phylogenetic trees on 360 Corynebacteriales genomes. All estimated trees display each genus as a monophyletic clade. The trees also display several relationships proposed by past studies, as well as new relevant relationships between and within the main genera of Corynebacteriales: Corynebacterium, Mycobacterium, Nocardia, Rhodococcus, and Gordonia. An implementation of the method in Python is available on GitHub at https://github.com/UdeS-CoBIUS/EXECT (last accessed April 2, 2020).

Download Full-text

Complete Characterization of Incorrect Orthology Assignments in Best Match Graphs

Journal of Mathematical Biology ◽

10.1007/s00285-021-01564-8 ◽

2021 ◽

Vol 82 (3) ◽

Cited By ~ 2

Author(s):

David Schaller ◽

Manuela Geiß ◽

Peter F. Stadler ◽

Marc Hellmuth

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

A Priori ◽

Gene Tree ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Complete Characterization ◽

Species Trees ◽

Genome Scale

AbstractGenome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least $$75\%$$ 75 % of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.

Download Full-text

Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer

10.1101/023168 ◽

2015 ◽

Cited By ~ 1

Author(s):

Ruth Davidson ◽

Pranjal Vachaspati ◽

Siavash Mirarab ◽

Tandy Warnow

Keyword(s):

Maximum Likelihood ◽

Gene Transfer ◽

Horizontal Gene Transfer ◽

Incomplete Lineage Sorting ◽

Gene Tree ◽

Species Tree ◽

Estimation Methods ◽

Species Trees ◽

Lineage Sorting ◽

Tree Estimation

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS. Keywords: phylogenomics; HGT; ILS; summary methods; concatenation

Download Full-text

Pareto Set as a Model for Dispatching Resources in Emergency Centres (Preprint)

10.2196/preprints.10298 ◽

2018 ◽

Author(s):

Ricardo Guedes ◽

Vasco Furtado ◽

Tarcísio Pequeno ◽

Joel Rodrigues

Keyword(s):

Operational Research ◽

Real Data ◽

Pareto Set ◽

Multi Objective Optimization ◽

Pareto Dominance ◽

Four Dimensions ◽

Multi Objective ◽

The Best Approximation ◽

One Year ◽

The Cost

UNSTRUCTURED The article investigates policies for helping emergency-centre authorities for dispatching resources aimed at reducing goals such as response time, the number of unattended calls, the attending of priority calls, and the cost of displacement of vehicles. Pareto Set is shown to be the appropriated way to support the representation of policies of dispatch since it naturally fits the challenges of multi-objective optimization. By means of the concept of Pareto dominance a set with objectives may be ordered in a way that guides the dispatch of resources. Instead of manually trying to identify the best dispatching strategy, a multi-objective evolutionary algorithm coupled with an Emergency Call Simulator uncovers automatically the best approximation of the optimal Pareto Set that would be the responsible for indicating the importance of each objective and consequently the order of attendance of the calls. The scenario of validation is a big metropolis in Brazil using one-year of real data from 911 calls. Comparisons with traditional policies proposed in the literature are done as well as other innovative policies inspired from different domains as computer science and operational research. The results show that strategy of ranking the calls from a Pareto Set discovered by the evolutionary method is a good option because it has the second best (lowest) waiting time, serves almost 100% of priority calls, is the second most economical, and is the second in attendance of calls. That is to say, it is a strategy in which the four dimensions are considered without major impairment to any of them.

Download Full-text

ALPINE: Active Link Prediction Using Network Embedding

Applied Sciences ◽

10.3390/app11115043 ◽

2021 ◽

Vol 11 (11) ◽

pp. 5043

Author(s):

Xi Chen ◽

Bo Kang ◽

Jefrey Lijffijt ◽

Tijl De Bie

Keyword(s):

Active Learning ◽

Protein Interactions ◽

Link Prediction ◽

Prediction Accuracy ◽

Real Data ◽

Network Embedding ◽

Protein Protein Interactions ◽

Additional Information ◽

The Cost ◽

Active Link

Many real-world problems can be formalized as predicting links in a partially observed network. Examples include Facebook friendship suggestions, the prediction of protein–protein interactions, and the identification of hidden relationships in a crime network. Several link prediction algorithms, notably those recently introduced using network embedding, are capable of doing this by just relying on the observed part of the network. Often, whether two nodes are linked can be queried, albeit at a substantial cost (e.g., by questionnaires, wet lab experiments, or undercover work). Such additional information can improve the link prediction accuracy, but owing to the cost, the queries must be made with due consideration. Thus, we argue that an active learning approach is of great potential interest and developed ALPINE (Active Link Prediction usIng Network Embedding), a framework that identifies the most useful link status by estimating the improvement in link prediction accuracy to be gained by querying it. We proposed several query strategies for use in combination with ALPINE, inspired by the optimal experimental design and active learning literature. Experimental results on real data not only showed that ALPINE was scalable and boosted link prediction accuracy with far fewer queries, but also shed light on the relative merits of the strategies, providing actionable guidance for practitioners.

Download Full-text

Cloud Brokering with Bundles: Multi-objective Optimization of Services Selection

Foundations of Computing and Decision Sciences ◽

10.2478/fcds-2019-0020 ◽

2019 ◽

Vol 44 (4) ◽

pp. 407-426

Author(s):

Jedrzej Musial ◽

Emmanuel Kieffer ◽

Mateusz Guzek ◽

Gregoire Danoy ◽

Shyam S. Wagle ◽

...

Keyword(s):

Optimization Problem ◽

Real Data ◽

Cloud Service ◽

Optimal Choice ◽

Cloud Services ◽

Multi Objective Optimization ◽

Problem Instances ◽

The Cost ◽

Special Case ◽

Single Objective

Abstract Cloud computing has become one of the major computing paradigms. Not only the number of offered cloud services has grown exponentially but also many different providers compete and propose very similar services. This situation should eventually be beneficial for the customers, but considering that these services slightly differ functionally and non-functionally -wise (e.g., performance, reliability, security), consumers may be confused and unable to make an optimal choice. The emergence of cloud service brokers addresses these issues. A broker gathers information about services from providers and about the needs and requirements of the customers, with the final goal of finding the best match. In this paper, we formalize and study a novel problem that arises in the area of cloud brokering. In its simplest form, brokering is a trivial assignment problem, but in more complex and realistic cases this does not longer hold. The novelty of the presented problem lies in considering services which can be sold in bundles. Bundling is a common business practice, in which a set of services is sold together for the lower price than the sum of services’ prices that are included in it. This work introduces a multi-criteria optimization problem which could help customers to determine the best IT solutions according to several criteria. The Cloud Brokering with Bundles (CBB) models the different IT packages (or bundles) found on the market while minimizing (maximizing) different criteria. A proof of complexity is given for the single-objective case and experiments have been conducted with a special case of two criteria: the first one being the cost and the second is artificially generated. We also designed and developed a benchmark generator, which is based on real data gathered from 19 cloud providers. The problem is solved using an exact optimizer relying on a dichotomic search method. The results show that the dichotomic search can be successfully applied for small instances corresponding to typical cloud-brokering use cases and returns results in terms of seconds. For larger problem instances, solving times are not prohibitive, and solutions could be obtained for large, corporate clients in terms of minutes.

Download Full-text

How to Tackle Phylogenetic Discordance in Recent and Rapidly Radiating Groups? Developing a Workflow Using Loricaria (Asteraceae) as an Example

Frontiers in Plant Science ◽

10.3389/fpls.2021.765719 ◽

2022 ◽

Vol 12 ◽

Author(s):

Martha Kandziora ◽

Petr Sklenář ◽

Filip Kolář ◽

Roswitha Schmickl

Keyword(s):

Incomplete Lineage Sorting ◽

Gene Tree ◽

Species Tree ◽

Secondary Contact ◽

Gene Trees ◽

Species Trees ◽

Lineage Sorting ◽

Phylogenetic Discordance ◽

Gene Tree Discordance ◽

High Degree

A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus Loricaria to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus Loricaria. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.

Download Full-text

Variety Discrimination Power: An Appraisal Index for Loci Combination Screening Applied to Plant Variety Discrimination

Frontiers in Plant Science ◽

10.3389/fpls.2021.566796 ◽

2021 ◽

Vol 12 ◽

Author(s):

Yang Yang ◽

Hongli Tian ◽

Rui Wang ◽

Lu Wang ◽

Hongmei Yi ◽

...

Keyword(s):

Molecular Marker ◽

Simulated Data ◽

Real Data ◽

Total Probability ◽

Discrimination Power ◽

Plant Variety ◽

Appraisal Index ◽

Two Phases ◽

Threshold Setting ◽

The Cost

Molecular marker technology is used widely in plant variety discrimination, molecular breeding, and other fields. To lower the cost of testing and improve the efficiency of data analysis, molecular marker screening is very important. Screening usually involves two phases: the first to control loci quality and the second to reduce loci quantity. To reduce loci quantity, an appraisal index that is very sensitive to a specific scenario is necessary to select loci combinations. In this study, we focused on loci combination screening for plant variety discrimination. A loci combination appraisal index, variety discrimination power (VDP), is proposed, and three statistical methods, probability-based VDP (P-VDP), comparison-based VDP (C-VDP), and ratio-based VDP (R-VDP), are described and compared. The results using the simulated data showed that VDP was sensitive to statistical populations with convergence toward the same variety, and the total probability of discrimination power (TDP) method was effective only for partial populations. R-VDP was more sensitive to statistical populations with convergence toward various varieties than P-VDP and C-VDP, which both had the same sensitivity; TDP was not sensitive at all. With the real data, R-VDP values for sorghum, wheat, maize and rice data begin to show downward tendency when the number of loci is 20, 7, 100, 100 respectively, while in the case of P-VDP and C-VDP (which have the same results), the number is 6, 4, 9, 19 respectively and in the case of TDP, the number is 6, 4, 4, 11 respectively. For the variety threshold setting, R-VDP values of loci combinations with different numbers of loci responded evenly to different thresholds. C-VDP values responded unevenly to different thresholds, and the extent of the response increased as the number of loci decreased. All the methods gave underestimations when data were missing, with systematic errors for TDP, C-VDP, and R-VDP going from smallest to biggest. We concluded that VDP was a better loci combination appraisal index than TDP for plant variety discrimination and the three VDP methods have different applications. We developed the software called VDPtools, which can calculate the values of TDP, P-VDP, C-VDP, and R-VDP. VDPtools is publicly available athttps://github.com/caurwx1/VDPtools.git.

Download Full-text

Valuation of Weather Manifested Rice Cultivation in Bangladesh: A Way Forward

American International Journal of Agricultural Studies ◽

10.46545/aijas.v2i1.115 ◽

2019 ◽

Vol 2 (1) ◽

pp. 62-64

Author(s):

Afroza Chowdhury ◽

Abdullah Al Mamun ◽

Niaz Md. Farhat Rahman

Keyword(s):

Weather Forecasting ◽

Management Practice ◽

Weather Prediction ◽

Real Data ◽

Crop Damage ◽

Financial Benefit ◽

Climatic Effects ◽

Advisory Service ◽

Good Management Practice ◽

The Cost

Good crop yield entirely depends on good management practice and quality management of crops allied with authentic weather forecasting can reduce risk, crop damage, cost of production and increase the yield as well. Bangladesh Rice Research Institute (BRRI) had aimed this study for quantifying the financial benefit of forecasting and validating micro climatological factors and their impacts on paddy production through experimentation and arranging for weather based agro meteorological advisory service delivery to the farmers applying the tools of ICT. A next-generation meso-scale numerical weather prediction system, WRF (Weather Research and Forecasting) model was used for generating atmospheric simulations based on real data (observations, analyses or idealized conditions). Field experiment was conducted in the areas of five different agro microclimatological conditions for Boro rice production, namely; Gazipur, Habiganj, Rajshahi, Barishal and Satkhira. Then the experimental fields were managed according to the weekly management advisory based on weather forecasts. Results obtained suggest that the application of weather predictability accrued a comparative rice yield benefit of 9-12% and a 3–5% reduction in the cost of cultivation. Countrywide application of agro-meteorological advisory service may pave the way for averting adverse climatic effects on agriculture.

Download Full-text

Simultaneous source separation using a robust Radon transform

Geophysics ◽

10.1190/geo2013-0168.1 ◽

2014 ◽

Vol 79 (1) ◽

pp. V1-V11 ◽

Cited By ~ 77

Author(s):

Amr Ibrahim ◽

Mauricio D. Sacchi

Keyword(s):

Radon Transform ◽

Real Data ◽

Radon Transforms ◽

Accurate Data ◽

Quadratic Formula ◽

Penalty Term ◽

Source Data ◽

The Cost ◽

Simultaneous Source ◽

Incoherent Noise

We adopted the robust Radon transform to eliminate erratic incoherent noise that arises in common receiver gathers when simultaneous source data are acquired. The proposed robust Radon transform was posed as an inverse problem using an [Formula: see text] misfit that is not sensitive to erratic noise. The latter permitted us to design Radon algorithms that are capable of eliminating incoherent noise in common receiver gathers. We also compared nonrobust and robust Radon transforms that are implemented via a quadratic ([Formula: see text]) or a sparse ([Formula: see text]) penalty term in the cost function. The results demonstrated the importance of incorporating a robust misfit functional in the Radon transform to cope with simultaneous source interferences. Synthetic and real data examples proved that the robust Radon transform produces more accurate data estimates than least-squares and sparse Radon transforms.

Download Full-text