scholarly journals BCD Beam Search: considering suboptimal partial solutions in Bad Clade Deletion supertrees

PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4987
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods enable the reconstruction of large phylogenies. The supertree problem can be formalized in different ways in order to cope with contradictory information in the input. Some supertree methods are based on encoding the input trees in a matrix; other methods try to find minimum cuts in some graph. Recently, we introduced Bad Clade Deletion (BCD) supertrees which combines the graph-based computation of minimum cuts with optimizing a global objective function on the matrix representation of the input trees. The BCD supertree method has guaranteed polynomial running time and is very swift in practice. The quality of reconstructed supertrees was superior to matrix representation with parsimony (MRP) and usually on par with SuperFine for simulated data; but particularly for biological data, quality of BCD supertrees could not keep up with SuperFine supertrees. Here, we present a beam search extension for the BCD algorithm that keeps alive a constant number of partial solutions in each top-down iteration phase. The guaranteed worst-case running time of the new algorithm is still polynomial in the size of the input. We present an exact and a randomized subroutine to generate suboptimal partial solutions. Both beam search approaches consistently improve supertree quality on all evaluated datasets when keeping 25 suboptimal solutions alive. Supertree quality of the BCD Beam Search algorithm is on par with MRP and SuperFine even for biological data. This is the best performance of a polynomial-time supertree algorithm reported so far.

Sensors ◽  
2020 ◽  
Vol 20 (15) ◽  
pp. 4115
Author(s):  
Jolanta Koszelew ◽  
Joanna Karbowska-Chilinska ◽  
Krzysztof Ostrowski ◽  
Piotr Kuczyński ◽  
Eric Kulbiej ◽  
...  

A single anti-collision trajectory generation problem for an “own” vessel only is significantly different from the challenge of generating a whole set of safe trajectories for multi-surface vehicle encounter situations in the open sea. Effective solutions for such problems are needed these days, as we are entering the era of autonomous ships. The article specifies the problem of anti-collision trajectory planning in many-to-many encounter situations. The proposed original multi-surface vehicle beam search algorithm (MBSA), based on the beam search strategy, solves the problem. The general idea of the MBSA involves the application of a solution for one-to-many encounter situations (using the beam search algorithm, BSA), which was tested on real automated radar plotting aid (ARPA) and automatic identification system (AIS) data. The test results for the MBSA were from simulated data, which are discussed in the final part. The article specifies the problem of anti-collision trajectory planning in many-to-many encounter situations involving moving autonomous surface vehicles, excluding Collision Regulations (COLREGs) and vehicle dynamics.


2013 ◽  
Vol 11 (05) ◽  
pp. 1342005 ◽  
Author(s):  
WEN-CHIEH CHANG ◽  
PAWEŁ GÓRECKI ◽  
OLIVER EULENSTEIN

Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.


2015 ◽  
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e2172 ◽  
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well-known Matrix Representation with Parsimony, while others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the GSCM supertree. We find this modifications to increase the number of true positive clades by 18% compared to the currently used Overlap scoring.


2015 ◽  
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.


2015 ◽  
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.


2015 ◽  
Author(s):  
Markus Fleischauer ◽  
Sebastian Böcker

Supertree methods combine a set of phylogenetic trees into a single supertree. Similar to supermatrix methods, these methods provide a way to reconstruct larger parts of the Tree of Life, potentially evading the computational complexity of phylogenetic inference methods such as maximum likelihood. The supertree problem can be formalized in different ways, to cope with contradictory information in the input. Many supertree methods have been developed. Some of them solve NP-hard optimization problems like the well known Matrix Representation with Parsimony, others have polynomial worst-case running time but work in a greedy fashion (FlipCut). Both can profit from a set of clades that are already known to be part of the supertree. The Superfine approach shows how the Greedy Strict Consensus Merger (GSCM) can be used as preprocessing to find these clades. We introduce different scoring functions for the GSCM, a randomization, as well as a combination thereof to improve the GSCM to find more clades. This helps, in turn, to improve the resolution of the final supertree. We find this modifications to increase the number of true positive clades by 16% while decreasing the number of false positive clades by 3% compared to the currently used Overlap scoring.


2021 ◽  
Vol 29 ◽  
pp. 287-295
Author(s):  
Zhiming Zhou ◽  
Haihui Huang ◽  
Yong Liang

BACKGROUND: In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE: The aim of this paper is to give the model efficient gene selection capability. METHODS: In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS: Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS: The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.


Sign in / Sign up

Export Citation Format

Share Document