scholarly journals Learning from Both Experts and Data

Entropy ◽  
2019 ◽  
Vol 21 (12) ◽  
pp. 1208
Author(s):  
Rémi Besson ◽  
Erwan Le Pennec ◽  
Stéphanie Allassonnière

In this work, we study the problem of inferring a discrete probability distribution using both expert knowledge and empirical data. This is an important issue for many applications where the scarcity of data prevents a purely empirical approach. In this context, it is common to rely first on an a priori from initial domain knowledge before proceeding to an online data acquisition. We are particularly interested in the intermediate regime, where we do not have enough data to do without the initial a priori of the experts, but enough to correct it if necessary. We present here a novel way to tackle this issue, with a method providing an objective way to choose the weight to be given to experts compared to data. We show, both empirically and theoretically, that our proposed estimator is always more efficient than the best of the two models (expert or data) within a constant.

2021 ◽  
Vol 3 (2) ◽  
pp. 299-317
Author(s):  
Patrick Schrempf ◽  
Hannah Watson ◽  
Eunsoo Park ◽  
Maciej Pajak ◽  
Hamish MacKinnon ◽  
...  

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.


2014 ◽  
Vol 651-653 ◽  
pp. 2296-2300
Author(s):  
Jing Huang ◽  
She Yu Zhou ◽  
Bing Lei Guan

Based on the theory of ultrasonic testing, an online data-acquisition and storage system is designed. The design scheme of hardware and software of the system is introduced in this paper, in which the embedded processor DSP and FPGA is used as its control core and the interface of PCI bus and DSP is designed. Thus a high speed and large-capacity ultrasonic signal can be processed, furthermore the pipelines defects can be analyze and evaluate.


2016 ◽  
Vol 5 (4) ◽  
pp. 106-113 ◽  
Author(s):  
Tamer El Nashar

The objective of this paper is to examine the impact of inclusive business on the internal ethical values and the internal control quality while conceiving the accounting perspective. I construct the hypothesis for this paper based on the potential impact on the organizations’ awareness to be directed to the inclusive business approach that will significantly impact the culture of the organizations then the ethical values and the internal control quality. I use the approach of the expected value and variance of random variable test in order to analyze the potential impact of inclusive business. I support the examination by discrete probability distribution and continuous probability distribution. I find a probability of 85.5% to have a significant potential impact of the inclusive business by 100% score on internal ethical values and internal control quality. And to help contribute to sustainability growth, reduce poverty and improve organizational culture and learning.


2014 ◽  
Vol 17 (5) ◽  
pp. 376-382 ◽  
Author(s):  
Rita Hegedűs ◽  
András Pári ◽  
Zsófia Drjenovszky ◽  
Hanna Kónya

Aiming to perform the first sociological survey of Hungarian twins, our main question was whether being a twin has positive consequences on one's life. Adult twins completed our questionnaire at three Hungarian summer twin festivals, in hospitals during medical twin studies, and on some websites online. Data represent 140 twin pairs (mean age: 38.2 ± 14.6 years). We employed some indices for measuring the resource nature of twinship. Three main types of benefits were distinguished: profit of attraction, as ‘material capital’; the easier obtainability of cultural goods when twins take part in it, as ‘cultural capital’; and positive aspects of an a priori existing dyadic relation, as ‘relational capital’. We were interested in the difference among types of twins regarding advantages. We paid special attention to the five groups of twins derived from gender and zygosity (i.e., monozygotic females, monozygotic males, dizygotic females, dizygotic males, opposite-sex pairs). Our analysis showed that Hungarian twins involved in our research basically enjoy their twinship; during their lives they used and still make use of different benefits given by it. In our twin samples, women had more advantages from being a twin than men. Significant differences could be observed on all indicators between monozygotic and dizygotic twins.


2021 ◽  
Author(s):  
Peter Lukacs ◽  
Theodosia Stratoudaki ◽  
Geo Davis ◽  
Anthony Gachagan

Abstract This study introduces a novel data acquisition method, the Selective Matrix Capture (SMC), that can adapt the array geometry during data acquisition, to the demands of the inspected structure, such as the defects encountered. The adaptive data acquisition method is enabled by the use of Laser Induced Phased Arrays (LIPAs). We have previously demonstrated high-resolution ultrasonic images of the interior of components using Full Matrix Capture (FMC) and the Total Focusing Method (TFM). However, capturing the FMC requires long synthesis time due to signal averaging and mechanical laser scanning, compromising the application potential of LIPAs. Given that most components are defect free, significant time savings can be obtained by only acquiring high-fidelity data when a defect is indicated. The paper presents the Selective Matrix Capture that acquires data more efficiently without a priori knowledge of the location of the defects, while still achieving the superior imaging quality provided by an FMC data set.


Author(s):  
Diane J. Cook ◽  
Lawrence B. Holder

The large amount of data collected today is quickly overwhelming researchers’ abilities to interpret the data and discover interesting patterns. In response to this problem, a number of researchers have developed techniques for discovering concepts in databases. These techniques work well for data expressed in a nonstructural, attribute-value representation and address issues of data relevance, missing data, noise and uncertainty, and utilization of domain knowledge (Fisher, 1987; Cheeseman and Stutz, 1996). However, recent data acquisition projects are collecting structural data describing the relationships among the data objects. Correspondingly, there exists a need for techniques to analyze and discover concepts in structural databases (Fayyad et al., 1996b). One method for discovering knowledge in structural data is the identification of common substructures. The goal is to find substructures capable of compressing the data and to identify conceptually interesting substructures that enhance the interpretation of the data. Substructure discovery is the process of identifying concepts describing interesting and repetitive substructures within structural data. Once discovered, the substructure concept can be used to simplify the data by replacing instances of the substructure with a pointer to the newly discovered concept. The discovered substructure concepts allow abstraction over detailed structure in the original data and provide new, relevant attributes for interpreting the data. Iteration of the substructure discovery and replacement process constructs a hierarchical description of the structural data in terms of the discovered substructures. This hierarchy provides varying levels of interpretation that can be accessed based on the goals of the data analysis. We describe a system called Subdue that discovers interesting substructures in structural data based on the minimum description length (MDL) principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously discovered substructures, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.


2013 ◽  
Vol 2 (4) ◽  
pp. 61-78 ◽  
Author(s):  
Roy L. Nersesian ◽  
Kenneth David Strang

This study discussed the theoretical literature related to developing and probability distributions for estimating uncertainty. A theoretically selected ten-year empirical sample was collected and evaluated for the Albany NY area (N=942). A discrete probability distribution model was developed and applied for part of the sample, to illustrate the likelihood of petroleum spills by industry and day of week. The benefit of this paper for the community of practice was to demonstrate how to select, develop, test and apply a probability distribution to analyze the patterns in disaster events, using inferential parametric and nonparametric statistical techniques. The method, not the model, was intended to be generalized to other researchers and populations. An interesting side benefit from this study was that it revealed significant findings about where and when most of the human-attributed petroleum leaks had occurred in the Albany NY area over the last ten years (ending in 2013). The researchers demonstrated how to develop and apply distribution models in low cost spreadsheet software (Excel).


2019 ◽  
Vol 23 ◽  
pp. 947-978
Author(s):  
Shota Gugushvili ◽  
Frank van der Meulen ◽  
Moritz Schauer ◽  
Peter Spreij

According to both domain expert knowledge and empirical evidence, wavelet coefficients of real signals tend to exhibit clustering patterns, in that they contain connected regions of coefficients of similar magnitude (large or small). A wavelet de-noising approach that takes into account such a feature of the signal may in practice outperform other, more vanilla methods, both in terms of the estimation error and visual appearance of the estimates. Motivated by this observation, we present a Bayesian approach to wavelet de-noising, where dependencies between neighbouring wavelet coefficients are a priori modelled via a Markov chain-based prior, that we term the caravan prior. Posterior computations in our method are performed via the Gibbs sampler. Using representative synthetic and real data examples, we conduct a detailed comparison of our approach with a benchmark empirical Bayes de-noising method (due to Johnstone and Silverman). We show that the caravan prior fares well and is therefore a useful addition to the wavelet de-noising toolbox.


2011 ◽  
Vol 317-319 ◽  
pp. 681-684
Author(s):  
Yi Sheng Huang ◽  
Ho Shan Chiang

A novel approach for probabilistic timed structure that is based on combining the formalisms of timed automata and probabilistic automata representation of the system is proposed. Due to their real-valued clocks can measure the passage of time and transitions can be probabilistic such that it can be expressed as a discrete probability distribution on the set of target states. The usage of clock variables and the specification of state space are illustrated with real value time applications. The transitions between states are probabilistic by events which describe either the occurrence of faults or normal working conditions. Additionally, the passage of discrete time and transitions can be probabilistic by mean of the theory of expectation sets to obtain a unified measure reasoning strategy.


Sign in / Sign up

Export Citation Format

Share Document