Learning the Structure of Bayesian Networks: A Quantitative Assessment of the Effect of Different Algorithmic Schemes

Complexity ◽

10.1155/2018/1591878 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Stefano Beretta ◽

Mauro Castelli ◽

Ivo Gonçalves ◽

Roberto Henriques ◽

Daniele Ramazzotti

Keyword(s):

Bayesian Networks ◽

Quantitative Assessment ◽

State Of The Art ◽

Simulated Data ◽

Detailed Comparison ◽

Search Space ◽

Continuous Variables ◽

Challenging Tasks ◽

The One ◽

Algorithmic Approaches

One of the most challenging tasks when adopting Bayesian networks (BNs) is the one of learning their structure from data. This task is complicated by the huge search space of possible solutions and by the fact that the problem is NP-hard. Hence, a full enumeration of all the possible solutions is not always feasible and approximations are often required. However, to the best of our knowledge, a quantitative analysis of the performance and characteristics of the different heuristics to solve this problem has never been done before. For this reason, in this work, we provide a detailed comparison of many different state-of-the-art methods for structural learning on simulated data considering both BNs with discrete and continuous variables and with different rates of noise in the data. In particular, we investigate the performance of different widespread scores and algorithmic approaches proposed for the inference and the statistical pitfalls within them.

Download Full-text

Efficient exploratory clustering analyses in large-scale exploration processes

The VLDB Journal ◽

10.1007/s00778-021-00716-y ◽

2021 ◽

Author(s):

Manuel Fritz ◽

Michael Behringer ◽

Dennis Tschechlov ◽

Holger Schwarz

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Clustering Algorithms ◽

Search Space ◽

Large Datasets ◽

Search Spaces ◽

Multiple Challenges ◽

The One

AbstractClustering is a fundamental primitive in manifold applications. In order to achieve valuable results in exploratory clustering analyses, parameters of the clustering algorithm have to be set appropriately, which is a tremendous pitfall. We observe multiple challenges for large-scale exploration processes. On the one hand, they require specific methods to efficiently explore large parameter search spaces. On the other hand, they often exhibit large runtimes, in particular when large datasets are analyzed using clustering algorithms with super-polynomial runtimes, which repeatedly need to be executed within exploratory clustering analyses. We address these challenges as follows: First, we present LOG-Means and show that it provides estimates for the number of clusters in sublinear time regarding the defined search space, i.e., provably requiring less executions of a clustering algorithm than existing methods. Second, we demonstrate how to exploit fundamental characteristics of exploratory clustering analyses in order to significantly accelerate the (repetitive) execution of clustering algorithms on large datasets. Third, we show how these challenges can be tackled at the same time. To the best of our knowledge, this is the first work which simultaneously addresses the above-mentioned challenges. In our comprehensive evaluation, we unveil that our proposed methods significantly outperform state-of-the-art methods, thus especially supporting novice analysts for exploratory clustering analyses in large-scale exploration processes.

Download Full-text

One-Shot Neural Architecture Search by Dynamically Pruning Supernet in Hierarchical Order

International Journal of Neural Systems ◽

10.1142/s0129065721500295 ◽

2021 ◽

pp. 2150029

Author(s):

Jianwei Zhang ◽

Dong Li ◽

Lituan Wang ◽

Lei Zhang

Keyword(s):

State Of The Art ◽

Search Space ◽

Experimental Results ◽

Research Interest ◽

Search Process ◽

Search Efficiency ◽

Neural Architecture ◽

Speed Up ◽

The One

Neural Architecture Search (NAS), which aims at automatically designing neural architectures, recently draw a growing research interest. Different from conventional NAS methods, in which a large number of neural architectures need to be trained for evaluation, the one-shot NAS methods only have to train one supernet which synthesizes all the possible candidate architectures. As a result, the search efficiency could be significantly improved by sharing the supernet’s weights during the candidate architectures’ evaluation. This strategy could greatly speed up the search process but suffer a challenge that the evaluation based on sharing weights is not predictive enough. Recently, pruning the supernet during the search has been proven to be an efficient way to alleviate this problem. However, the pruning direction in complex-structured search space remains unexplored. In this paper, we revisited the role of path dropout strategy, which drops the neural operations instead of the neurons, in supernet training, and several interesting characters of the supernet trained with dropout are found. Based on the observations, a Hierarchically-Ordered Pruning Neural Architecture Search (HOPNAS) algorithm is proposed by dynamically pruning the supernet with a proper pruning direction. Experimental results indicate that our method is competitive with state-of-the-art approaches on CIFAR10 and ImageNet.

Download Full-text

Estimating Simultaneous Equation Models through an Entropy-Based Incremental Variational Bayes Learning Algorithm

Entropy ◽

10.3390/e23040384 ◽

2021 ◽

Vol 23 (4) ◽

pp. 384

Author(s):

Rocío Hernández-Sanjaime ◽

Martín González ◽

Antonio Peñalver ◽

Jose J. López-Espín

Keyword(s):

Statistical Theory ◽

Learning Algorithm ◽

Real Life ◽

Simulated Data ◽

Simultaneous Equation ◽

Variational Bayes ◽

Parameter Estimates ◽

Step Method ◽

The One ◽

Simultaneous Equation Models

The presence of unaccounted heterogeneity in simultaneous equation models (SEMs) is frequently problematic in many real-life applications. Under the usual assumption of homogeneity, the model can be seriously misspecified, and it can potentially induce an important bias in the parameter estimates. This paper focuses on SEMs in which data are heterogeneous and tend to form clustering structures in the endogenous-variable dataset. Because the identification of different clusters is not straightforward, a two-step strategy that first forms groups among the endogenous observations and then uses the standard simultaneous equation scheme is provided. Methodologically, the proposed approach is based on a variational Bayes learning algorithm and does not need to be executed for varying numbers of groups in order to identify the one that adequately fits the data. We describe the statistical theory, evaluate the performance of the suggested algorithm by using simulated data, and apply the two-step method to a macroeconomic problem.

Download Full-text

Constructing Large-Scale Genetic Maps Using an Evolutionary Strategy Algorithm

Genetics ◽

10.1093/genetics/165.4.2269 ◽

2003 ◽

Vol 165 (4) ◽

pp. 2269-2282

Author(s):

D Mester ◽

Y Ronin ◽

D Minkov ◽

E Nevo ◽

A Korol

Keyword(s):

Discrete Optimization ◽

High Performance ◽

Large Scale ◽

Simulated Data ◽

Real Data ◽

Genetic Maps ◽

Chromosome 1 ◽

Evolutionary Strategy ◽

Group A ◽

The One

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

Download Full-text

Fast lightweight accurate xenograft sorting

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00181-w ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Jens Zentgraf ◽

Sven Rahmann

Keyword(s):

State Of The Art ◽

Hash Table ◽

Human Tumor ◽

Surrounding Tissue ◽

Cpu Time ◽

Alignment Free ◽

Time Usage ◽

The One ◽

Similar Accuracy ◽

Software Prefetching

Abstract Motivation With an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results. Results We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further. Availability Our software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.

Download Full-text

State of the art on lung organoids in mammals

Veterinary Research ◽

10.1186/s13567-021-00946-6 ◽

2021 ◽

Vol 52 (1) ◽

Author(s):

Fabienne Archer ◽

Alexandra Bobet-Erny ◽

Maryline Gomes

Keyword(s):

Lung Development ◽

One Health ◽

State Of The Art ◽

Animal Health ◽

In Vitro Models ◽

Cancer Genetic ◽

3 Dimensional ◽

The One ◽

Species Specific

AbstractThe number and severity of diseases affecting lung development and adult respiratory function have stimulated great interest in developing new in vitro models to study lung in different species. Recent breakthroughs in 3-dimensional (3D) organoid cultures have led to new physiological in vitro models that better mimic the lung than conventional 2D cultures. Lung organoids simulate multiple aspects of the real organ, making them promising and useful models for studying organ development, function and disease (infection, cancer, genetic disease). Due to their dynamics in culture, they can serve as a sustainable source of functional cells (biobanking) and be manipulated genetically. Given the differences between species regarding developmental kinetics, the maturation of the lung at birth, the distribution of the different cell populations along the respiratory tract and species barriers for infectious diseases, there is a need for species-specific lung models capable of mimicking mammal lungs as they are of great interest for animal health and production, following the One Health approach. This paper reviews the latest developments in the growing field of lung organoids.

Download Full-text

Extraction of causal relations based on SBEL and BERT model

Database ◽

10.1093/database/baab005 ◽

2021 ◽

Vol 2021 ◽

Author(s):

Yifan Shao ◽

Haoru Li ◽

Jinghang Gu ◽

Longhua Qian ◽

Guodong Zhou

Keyword(s):

State Of The Art ◽

Causal Relation ◽

Relation Extraction ◽

The Other ◽

Biomedical Text ◽

Intermediate Form ◽

Biomedical Text Mining ◽

Causal Relations ◽

The One ◽

Stage 1

Abstract Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form—SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets

Download Full-text

THE APPLICATION OF PETRI NETS TO WORKFLOW MANAGEMENT

Journal of Circuits System and Computers ◽

10.1142/s0218126698000043 ◽

1998 ◽

Vol 08 (01) ◽

pp. 21-66 ◽

Cited By ~ 1478

Author(s):

W. M. P. VAN DER AALST

Keyword(s):

Petri Nets ◽

Petri Net ◽

Business Processes ◽

State Of The Art ◽

Workflow Management ◽

Explicit Representation ◽

The Other ◽

Application Domain ◽

Analysis Techniques ◽

The One

Workflow management promises a new solution to an age-old problem: controlling, monitoring, optimizing and supporting business processes. What is new about workflow management is the explicit representation of the business process logic which allows for computerized support. This paper discusses the use of Petri nets in the context of workflow management. Petri nets are an established tool for modeling and analyzing processes. On the one hand, Petri nets can be used as a design language for the specification of complex workflows. On the other hand, Petri net theory provides for powerful analysis techniques which can be used to verify the correctness of workflow procedures. This paper introduces workflow management as an application domain for Petri nets, presents state-of-the-art results with respect to the verification of workflows, and highlights some Petri-net-based workflow tools.

Download Full-text

The Impact of Stationarity, Regularity, and Context on the Predictability of Individual Human Mobility

ACM Transactions on Spatial Algorithms and Systems ◽

10.1145/3459625 ◽

2021 ◽

Vol 7 (4) ◽

pp. 1-24

Author(s):

Douglas Do Couto Teixeira ◽

Aline Carneiro Viana ◽

Jussara M. Almeida ◽

Mrio S. Alvim

Keyword(s):

State Of The Art ◽

Human Mobility ◽

Contextual Information ◽

Prediction Method ◽

Mobility Prediction ◽

Mobility Patterns ◽

Distinct Cell ◽

Inherent Nature ◽

The One ◽

The Impact

Predicting mobility-related behavior is an important yet challenging task. On the one hand, factors such as one’s routine or preferences for a few favorite locations may help in predicting their mobility. On the other hand, several contextual factors, such as variations in individual preferences, weather, traffic, or even a person’s social contacts, can affect mobility patterns and make its modeling significantly more challenging. A fundamental approach to study mobility-related behavior is to assess how predictable such behavior is, deriving theoretical limits on the accuracy that a prediction model can achieve given a specific dataset. This approach focuses on the inherent nature and fundamental patterns of human behavior captured in that dataset, filtering out factors that depend on the specificities of the prediction method adopted. However, the current state-of-the-art method to estimate predictability in human mobility suffers from two major limitations: low interpretability and hardness to incorporate external factors that are known to help mobility prediction (i.e., contextual information). In this article, we revisit this state-of-the-art method, aiming at tackling these limitations. Specifically, we conduct a thorough analysis of how this widely used method works by looking into two different metrics that are easier to understand and, at the same time, capture reasonably well the effects of the original technique. We evaluate these metrics in the context of two different mobility prediction tasks, notably, next cell and next distinct cell prediction, which have different degrees of difficulty. Additionally, we propose alternative strategies to incorporate different types of contextual information into the existing technique. Our evaluation of these strategies offer quantitative measures of the impact of adding context to the predictability estimate, revealing the challenges associated with doing so in practical scenarios.

Download Full-text

Efficient reversible quantum design of sig-magnitude to two's complement converters

Quantum Information and Computation ◽

10.26421/qic20.9-10-3 ◽

2020 ◽

Vol 20 (9&10) ◽

pp. 747-765

Author(s):

F. Orts ◽

G. Ortega ◽

E.M. E.M. Garzon

Keyword(s):

Quantum Computing ◽

Scientific Community ◽

Quantum Computer ◽

Fault Tolerant ◽

State Of The Art ◽

The Other ◽

Quantum Computers ◽

Binary Number ◽

Quantum Cost ◽

The One

Despite the great interest that the scientific community has in quantum computing, the scarcity and high cost of resources prevent to advance in this field. Specifically, qubits are very expensive to build, causing the few available quantum computers are tremendously limited in their number of qubits and delaying their progress. This work presents new reversible circuits that optimize the necessary resources for the conversion of a sign binary number into two's complement of N digits. The benefits of our work are two: on the one hand, the proposed two's complement converters are fault tolerant circuits and also are more efficient in terms of resources (essentially, quantum cost, number of qubits, and T-count) than the described in the literature. On the other hand, valuable information about available converters and, what is more, quantum adders, is summarized in tables for interested researchers. The converters have been measured using robust metrics and have been compared with the state-of-the-art circuits. The code to build them in a real quantum computer is given.

Download Full-text