Learning from Both Experts and Data

Rémi Besson; Erwan Le Pennec; Stéphanie Allassonnière

doi:10.3390/e21121208

Learning from Both Experts and Data

Entropy ◽

10.3390/e21121208 ◽

2019 ◽

Vol 21 (12) ◽

pp. 1208

Author(s):

Rémi Besson ◽

Erwan Le Pennec ◽

Stéphanie Allassonnière

Keyword(s):

Probability Distribution ◽

Data Acquisition ◽

Empirical Data ◽

Domain Knowledge ◽

Expert Knowledge ◽

A Priori ◽

Online Data ◽

Discrete Probability ◽

Intermediate Regime ◽

Initial Domain

In this work, we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an a priori from initial domain knowledge before proceeding to an online data acquisition. We are particularly interested in the intermediate regime, where we do not have enough data to do without the initial a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue, with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.

Download Full-text

Templated Text Synthesis for Expert-Guided Multi-Label Extraction from Radiology Reports

Machine Learning and Knowledge Extraction ◽

10.3390/make3020015 ◽

2021 ◽

Vol 3 (2) ◽

pp. 299-317

Author(s):

Patrick Schrempf ◽

Hannah Watson ◽

Eunsoo Park ◽

Maciej Pajak ◽

Hamish MacKinnon ◽

...

Keyword(s):

Domain Knowledge ◽

Expert Knowledge ◽

Medical Image Analysis ◽

Imaging Data ◽

Data Synthesis ◽

Domain Specific ◽

Radiology Reports ◽

Training Examples ◽

Analysis Models ◽

Difficult Cases

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.

Download Full-text

Design of Online Pipeline Ultrasonic Data Acquisition and Storage System Based on PCI Bus

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.651-653.2296 ◽

2014 ◽

Vol 651-653 ◽

pp. 2296-2300

Author(s):

Jing Huang ◽

She Yu Zhou ◽

Bing Lei Guan

Keyword(s):

Data Acquisition ◽

Ultrasonic Testing ◽

High Speed ◽

Storage System ◽

Ultrasonic Signal ◽

Pci Bus ◽

Online Data ◽

Embedded Processor ◽

Design Scheme ◽

And Storage

Based on the theory of ultrasonic testing, an online data-acquisition and storage system is designed. The design scheme of hardware and software of the system is introduced in this paper, in which the embedded processor DSP and FPGA is used as its control core and the interface of PCI bus and DSP is designed. Thus a high speed and large-capacity ultrasonic signal can be processed, furthermore the pipelines defects can be analyze and evaluate.

Download Full-text

The impact of inclusive business on ethical values & internal control quality: an accounting perspective

Journal of Governance and Regulation ◽

10.22495/jgr_v5_i4_p10 ◽

2016 ◽

Vol 5 (4) ◽

pp. 106-113 ◽

Cited By ~ 1

Author(s):

Tamer El Nashar

Keyword(s):

Probability Distribution ◽

Internal Control ◽

Random Variable ◽

Ethical Values ◽

Discrete Probability ◽

Control Quality ◽

Inclusive Business ◽

Potential Impact ◽

Business Approach ◽

The Impact

The objective of this paper is to examine the impact of inclusive business on the internal ethical values and the internal control quality while conceiving the accounting perspective. I construct the hypothesis for this paper based on the potential impact on the organizations’ awareness to be directed to the inclusive business approach that will significantly impact the culture of the organizations then the ethical values and the internal control quality. I use the approach of the expected value and variance of random variable test in order to analyze the potential impact of inclusive business. I support the examination by discrete probability distribution and continuous probability distribution. I find a probability of 85.5% to have a significant potential impact of the inclusive business by 100% score on internal ethical values and internal control quality. And to help contribute to sustainability growth, reduce poverty and improve organizational culture and learning.

Download Full-text

Twinship as a Resource: Zygosity- and Gender-Based Comparison of Twins’ Attitudes Toward Twinship

Twin Research and Human Genetics ◽

10.1017/thg.2014.51 ◽

2014 ◽

Vol 17 (5) ◽

pp. 376-382 ◽

Cited By ~ 2

Author(s):

Rita Hegedűs ◽

András Pári ◽

Zsófia Drjenovszky ◽

Hanna Kónya

Keyword(s):

A Priori ◽

Twin Studies ◽

Main Question ◽

Cultural Goods ◽

Online Data ◽

Opposite Sex ◽

The Difference ◽

Gender Based ◽

And Gender ◽

Adult Twins

Aiming to perform the first sociological survey of Hungarian twins, our main question was whether being a twin has positive consequences on one's life. Adult twins completed our questionnaire at three Hungarian summer twin festivals, in hospitals during medical twin studies, and on some websites online. Data represent 140 twin pairs (mean age: 38.2 ± 14.6 years). We employed some indices for measuring the resource nature of twinship. Three main types of benefits were distinguished: profit of attraction, as ‘material capital’; the easier obtainability of cultural goods when twins take part in it, as ‘cultural capital’; and positive aspects of an a priori existing dyadic relation, as ‘relational capital’. We were interested in the difference among types of twins regarding advantages. We paid special attention to the five groups of twins derived from gender and zygosity (i.e., monozygotic females, monozygotic males, dizygotic females, dizygotic males, opposite-sex pairs). Our analysis showed that Hungarian twins involved in our research basically enjoy their twinship; during their lives they used and still make use of different benefits given by it. In our twin samples, women had more advantages from being a twin than men. Significant differences could be observed on all indicators between monozygotic and dizygotic twins.

Download Full-text

Adaptive Data Acquisition for Fast Ultrasonic Imaging Using Laser Induced Phased Arrays

10.1115/qnde2021-75107 ◽

2021 ◽

Author(s):

Peter Lukacs ◽

Theodosia Stratoudaki ◽

Geo Davis ◽

Anthony Gachagan

Keyword(s):

Data Acquisition ◽

Laser Scanning ◽

Phased Arrays ◽

A Priori ◽

Data Set ◽

Synthesis Time ◽

Application Potential ◽

Time Savings ◽

Ultrasonic Images ◽

Acquisition Method

Abstract This study introduces a novel data acquisition method, the Selective Matrix Capture (SMC), that can adapt the array geometry during data acquisition, to the demands of the inspected structure, such as the defects encountered. The adaptive data acquisition method is enabled by the use of Laser Induced Phased Arrays (LIPAs). We have previously demonstrated high-resolution ultrasonic images of the interior of components using Full Matrix Capture (FMC) and the Total Focusing Method (TFM). However, capturing the FMC requires long synthesis time due to signal averaging and mechanical laser scanning, compromising the application potential of LIPAs. Given that most components are defect free, significant time savings can be obtained by only acquiring high-fidelity data when a defect is indicated. The paper presents the Selective Matrix Capture that acquires data more efficiently without a priori knowledge of the location of the defects, while still achieving the superior imaging quality provided by an FMC data set.

Download Full-text

An Input Variable Importance Definition based on Empirical Data Probability Distribution

Feature Extraction - Studies in Fuzziness and Soft Computing ◽

10.1007/978-3-540-35488-8_27 ◽

2008 ◽

pp. 509-516 ◽

Cited By ~ 1

Author(s):

V. Lemaire ◽

F. Clérot

Keyword(s):

Probability Distribution ◽

Empirical Data ◽

Variable Importance

Download Full-text

Discovering Concepts in Structural Data

Pattern Discovery in Biomolecular Data ◽

10.1093/oso/9780195119404.003.0016 ◽

1999 ◽

Author(s):

Diane J. Cook ◽

Lawrence B. Holder

Keyword(s):

Data Analysis ◽

Missing Data ◽

Data Acquisition ◽

Domain Knowledge ◽

Minimum Description Length ◽

Structural Data ◽

Original Data ◽

Graph Match ◽

Replacement Process ◽

Data Objects

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.

Download Full-text

Risk Planning with Discrete Distribution Analysis Applied to Petroleum Spills

International Journal of Risk and Contingency Management ◽

10.4018/ijrcm.2013100105 ◽

2013 ◽

Vol 2 (4) ◽

pp. 61-78 ◽

Cited By ~ 9

Author(s):

Roy L. Nersesian ◽

Kenneth David Strang

Keyword(s):

Probability Distribution ◽

Probability Distributions ◽

Low Cost ◽

Discrete Distribution ◽

Distribution Model ◽

Distribution Analysis ◽

Distribution Models ◽

Discrete Probability ◽

Nonparametric Statistical ◽

Spreadsheet Software

This study discussed the theoretical literature related to developing and probability distributions for estimating uncertainty. A theoretically selected ten-year empirical sample was collected and evaluated for the Albany NY area (N=942). A discrete probability distribution model was developed and applied for part of the sample, to illustrate the likelihood of petroleum spills by industry and day of week. The benefit of this paper for the community of practice was to demonstrate how to select, develop, test and apply a probability distribution to analyze the patterns in disaster events, using inferential parametric and nonparametric statistical techniques. The method, not the model, was intended to be generalized to other researchers and populations. An interesting side benefit from this study was that it revealed significant findings about where and when most of the human-attributed petroleum leaks had occurred in the Albany NY area over the last ten years (ending in 2013). The researchers demonstrated how to develop and apply distribution models in low cost spreadsheet software (Excel).

Download Full-text

Bayesian wavelet de-noising with the caravan prior

ESAIM Probability and Statistics ◽

10.1051/ps/2019019 ◽

2019 ◽

Vol 23 ◽

pp. 947-978

Author(s):

Shota Gugushvili ◽

Frank van der Meulen ◽

Moritz Schauer ◽

Peter Spreij

Keyword(s):

Empirical Bayes ◽

Expert Knowledge ◽

Estimation Error ◽

A Priori ◽

Real Data ◽

Detailed Comparison ◽

Wavelet Coefficients ◽

Visual Appearance ◽

Bayesian Wavelet ◽

Clustering Patterns

According to both domain expert knowledge and empirical evidence, wavelet coefficients of real signals tend to exhibit clustering patterns, in that they contain connected regions of coefficients of similar magnitude (large or small). A wavelet de-noising approach that takes into account such a feature of the signal may in practice outperform other, more vanilla methods, both in terms of the estimation error and visual appearance of the estimates. Motivated by this observation, we present a Bayesian approach to wavelet de-noising, where dependencies between neighbouring wavelet coefficients are a priori modelled via a Markov chain-based prior, that we term the caravan prior. Posterior computations in our method are performed via the Gibbs sampler. Using representative synthetic and real data examples, we conduct a detailed comparison of our approach with a benchmark empirical Bayes de-noising method (due to Johnstone and Silverman). We show that the caravan prior fares well and is therefore a useful addition to the wavelet de-noising toolbox.

Download Full-text

Enhancement of a Fault Measure for AMSs Using Probabilistic Timed Automata

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.317-319.681 ◽

2011 ◽

Vol 317-319 ◽

pp. 681-684

Author(s):

Yi Sheng Huang ◽

Ho Shan Chiang

Keyword(s):

Probability Distribution ◽

Working Conditions ◽

Timed Automata ◽

Probabilistic Automata ◽

Discrete Probability ◽

Novel Approach ◽

Reasoning Strategy ◽

Real Value ◽

Target States ◽

Probabilistic Timed Automata

A novel approach for probabilistic timed structure that is based on combining the formalisms of timed automata and probabilistic automata representation of the system is proposed. Due to their real-valued clocks can measure the passage of time and transitions can be probabilistic such that it can be expressed as a discrete probability distribution on the set of target states. The usage of clock variables and the specification of state space are illustrated with real value time applications. The transitions between states are probabilistic by events which describe either the occurrence of faults or normal working conditions. Additionally, the passage of discrete time and transitions can be probabilistic by mean of the theory of expectation sets to obtain a unified measure reasoning strategy.

Download Full-text