BUMPER v1.0: A Bayesian User-friendly Model for Palaeo-Environmental Reconstruction

Mapping Intimacies ◽

10.5194/gmd-2016-227 ◽

2016 ◽

Author(s):

Philip B. Holden ◽

H. John B. Birks ◽

Stephen J. Brooks ◽

Mark B. Bush ◽

Grace M. Hwang ◽

...

Keyword(s):

Weighted Average ◽

Real Data ◽

Environmental Reconstruction ◽

Probabilistic Framework ◽

Training Set ◽

General Applicability ◽

User Friendly ◽

Site Training ◽

Artificial Datasets ◽

Training Sets

Abstract. We describe the Bayesian User-friendly Model for Palaeo-Environmental Reconstruction (BUMPER), a Bayesian transfer function for inferring past climate from microfossil assemblages. BUMPER is fully self-calibrating, straightforward to apply, and computationally fast, requiring ~ 2 seconds to build a 100-species model from a 100-site training-set on a standard personal computer. We apply the model's probabilistic framework to generate thousands of artificial training-sets under ideal assumptions. We then use these data to demonstrate the sensitivity of reconstructions to the characteristics of the training-set, considering assemblage richness, species tolerances, and the number of training sites. We find that a useful guideline for the size of a training-set is to provide, on average, at least ten samples of each species. We demonstrate general applicability to real data, considering three different organism types (chironomids, diatoms, pollen) and different reconstructed variables. An identically configured model is used in each application, the only change being the input files that provide the training-set environment and species-count data. The performance of BUMPER is shown to be comparable with Weighted Average Partial Least Squares (WAPLS) in each case. Additional artificial datasets are constructed with similar characteristics to the real data, and these are used to explore the reasons for the differing performances of the different training-sets.

Download Full-text

BUMPER v1.0: a Bayesian user-friendly model for palaeo-environmental reconstruction

Geoscientific Model Development ◽

10.5194/gmd-10-483-2017 ◽

2017 ◽

Vol 10 (1) ◽

pp. 483-498 ◽

Cited By ~ 4

Author(s):

Philip B. Holden ◽

H. John B. Birks ◽

Stephen J. Brooks ◽

Mark B. Bush ◽

Grace M. Hwang ◽

...

Keyword(s):

Weighted Average ◽

Real Data ◽

Environmental Reconstruction ◽

Probabilistic Framework ◽

Training Set ◽

General Applicability ◽

User Friendly ◽

Site Training ◽

Artificial Datasets ◽

Training Sets

Abstract. We describe the Bayesian user-friendly model for palaeo-environmental reconstruction (BUMPER), a Bayesian transfer function for inferring past climate and other environmental variables from microfossil assemblages. BUMPER is fully self-calibrating, straightforward to apply, and computationally fast, requiring ∼  2 s to build a 100-taxon model from a 100-site training set on a standard personal computer. We apply the model's probabilistic framework to generate thousands of artificial training sets under ideal assumptions. We then use these to demonstrate the sensitivity of reconstructions to the characteristics of the training set, considering assemblage richness, taxon tolerances, and the number of training sites. We find that a useful guideline for the size of a training set is to provide, on average, at least 10 samples of each taxon. We demonstrate general applicability to real data, considering three different organism types (chironomids, diatoms, pollen) and different reconstructed variables. An identically configured model is used in each application, the only change being the input files that provide the training-set environment and taxon-count data. The performance of BUMPER is shown to be comparable with weighted average partial least squares (WAPLS) in each case. Additional artificial datasets are constructed with similar characteristics to the real data, and these are used to explore the reasons for the differing performances of the different training sets.

Download Full-text

Justifying the data analytical choice in single case research in relation to the expected data pattern

10.31234/osf.io/2b9mu ◽

2019 ◽

Author(s):

Rumen Manolov

Keyword(s):

Data Analysis ◽

Visual Analysis ◽

Multilevel Models ◽

Single Case ◽

Analytical Techniques ◽

Real Data ◽

Small Scale ◽

Data User ◽

Single Case Research ◽

User Friendly

The lack of consensus regarding the most appropriate analytical techniques for single-case experimental designs data requires justifying the choice of any specific analytical option. The current text mentions some of the arguments, provided by methodologists and statisticians, in favor of several analytical techniques. Additionally, a small-scale literature review is performed in order to explore if and how applied researchers justify the analytical choices that they make. The review suggests that certain practices are not sufficiently explained. In order to improve the reporting regarding the data analytical decisions, it is proposed to choose and justify the data analytical approach prior to gathering the data. As a possible justification for data analysis plan, we propose using as a basis the expected the data pattern (specifically, the expectation about an improving baseline trend and about the immediate or progressive nature of the intervention effect). Although there are multiple alternatives for single-case data analysis, the current text focuses on visual analysis and multilevel models and illustrates an application of these analytical options with real data. User-friendly software is also developed.

Download Full-text

1D conditional generative adversarial network for spectrum-to-spectrum translation of simulated chemical reflectance signatures

Journal of Spectral Imaging ◽

10.1255/jsi.2021.a2 ◽

2021 ◽

Author(s):

Cara Murphy ◽

John Kerekes

Keyword(s):

Classification Accuracy ◽

Domain Adaptation ◽

Real Data ◽

Training Set ◽

Generative Adversarial Network ◽

Average Classification Accuracy ◽

Adversarial Network ◽

Chemical Residues ◽

Reflectance Data

The classification of trace chemical residues through active spectroscopic sensing is challenging due to the lack of physics-based models that can accurately predict spectra. To overcome this challenge, we leveraged the field of domain adaptation to translate data from the simulated to the measured domain for training a classifier. We developed the first 1D conditional generative adversarial network (GAN) to perform spectrum-to-spectrum translation of reflectance signatures. We applied the 1D conditional GAN to a library of simulated spectra and quantified the improvement in classification accuracy on real data using the translated spectra for training the classifier. Using the GAN-translated library, the average classification accuracy increased from 0.622 to 0.723 on real chemical reflectance data, including data from chemicals not included in the GAN training set.

Download Full-text

Regularization strategies for deep-learning-based salt model building

Interpretation ◽

10.1190/int-2018-0229.1 ◽

2019 ◽

Vol 7 (4) ◽

pp. T911-T922

Author(s):

Satyakee Sen ◽

Sribharath Kainkaryam ◽

Cen Ong ◽

Arvind Sharma

Keyword(s):

Deep Learning ◽

Large Scale ◽

Model Building ◽

Ground Truth ◽

Real Data ◽

Test Time ◽

Generalization Error ◽

Training Set ◽

Production Scale ◽

Ensemble Strategy

Salt model building has long been considered a severe bottleneck for large-scale 3D seismic imaging projects. It is one of the most time-consuming, labor-intensive, and difficult-to-automate processes in the entire depth imaging workflow requiring significant intervention by domain experts to manually interpret the salt bodies on noisy, low-frequency, and low-resolution seismic images at each iteration of the salt model building process. The difficulty and need for automating this task is well-recognized by the imaging community and has propelled the use of deep-learning-based convolutional neural network (CNN) architectures to carry out this task. However, significant challenges remain for reliable production-scale deployment of CNN-based methods for salt model building. This is mainly due to the poor generalization capabilities of these networks. When used on new surveys, never seen by the CNN models during the training stage, the interpretation accuracy of these models drops significantly. To remediate this key problem, we have introduced a U-shaped encoder-decoder type CNN architecture trained using a specialized regularization strategy aimed at reducing the generalization error of the network. Our regularization scheme perturbs the ground truth labels in the training set. Two different perturbations are discussed: one that randomly changes the labels of the training set, flipping salt labels to sediments and vice versa and the second that smooths the labels. We have determined that such perturbations act as a strong regularizer preventing the network from making highly confident predictions on the training set and thus reducing overfitting. An ensemble strategy is also used for test time augmentation that is shown to further improve the accuracy. The robustness of our CNN models, in terms of reduced generalization error and improved interpretation accuracy is demonstrated with real data examples from the Gulf of Mexico.

Download Full-text

Classification Algorithm for Person Identification and Gesture Recognition Based on Hand Gestures with Small Training Sets

Sensors ◽

10.3390/s20247279 ◽

2020 ◽

Vol 20 (24) ◽

pp. 7279

Author(s):

Krzysztof Rzecki

Keyword(s):

Gesture Recognition ◽

Error Rate ◽

Classification Accuracy ◽

Classification Algorithm ◽

Machine Learning Algorithms ◽

Training Data ◽

Training Set ◽

Person Identification ◽

Hand Gestures ◽

Training Sets

Classification algorithms require training data initially labelled by classes to build a model and then to be able to classify the new data. The amount and diversity of training data affect the classification quality and usually the larger the training set, the better the accuracy of classification. In many applications only small amounts of training data are available. This article presents a new time series classification algorithm for problems with small training sets. The algorithm was tested on hand gesture recordings in tasks of person identification and gesture recognition. The algorithm provides significantly better classification accuracy than other machine learning algorithms. For 22 different hand gestures performed by 10 people and the training set size equal to 5 gesture execution records per class, the error rate for the newly proposed algorithm is from 37% to 75% lower than for the other compared algorithms. When the training set consists of only one sample per class the new algorithm reaches from 45% to 95% lower error rate. Conducted experiments indicate that the algorithm outperforms state-of-the-art methods in terms of classification accuracy in the problem of person identification and gesture recognition.

Download Full-text

Reporting Correct p Values in VEGAS Analyses

Twin Research and Human Genetics ◽

10.1017/thg.2017.16 ◽

2017 ◽

Vol 20 (3) ◽

pp. 257-259 ◽

Cited By ~ 2

Author(s):

Julian Hecker ◽

Anna Maaser ◽

Dmitry Prokopenko ◽

Heide Loehlein Fier ◽

Christoph Lange

Keyword(s):

Linkage Disequilibrium ◽

False Positive ◽

Real Data ◽

Summary Statistics ◽

Methodological Framework ◽

Test Statistics ◽

P Values ◽

Different Types ◽

Linkage Disequilibrium Information ◽

User Friendly

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.

Download Full-text

ImmuneDEX: a strategy for the genetic improvement of immune competence in Australian Angus cattle

Journal of Animal Science ◽

10.1093/jas/skaa384 ◽

2021 ◽

Vol 99 (3) ◽

Cited By ~ 1

Author(s):

Antonio Reverter ◽

Brad C Hine ◽

Laercio Porto-Neto ◽

Yutao Li ◽

Christian J Duff ◽

...

Keyword(s):

Immune Response ◽

Genetic Improvement ◽

Weighted Average ◽

Real Data ◽

Antibiotic Use ◽

Alternative Methods ◽

Immune Competence ◽

Angus Cattle ◽

The Difference ◽

The Individual

Abstract In animal breeding and genetics, the ability to cope with disease, here defined as immune competence (IC), with minimal detriment to growth and fertility is a desired objective which addresses both animal production and welfare considerations. However, defining and objectively measuring IC phenotypes using testing methods which are practical to apply on-farm has been challenging. Based on previously described protocols, we measured both cell-mediated immune response (Cell-IR) and antibody-mediated immune response (Ab-IR) and combined these measures to determine an animal’s IC. Using a population of 2,853 Australian Angus steers and heifers, we compared 2 alternative methods to combine both metrics into a single phenotype to be used as a tool for the genetic improvement of IC. The first method, named ZMEAN, is obtained by taking the average of the individual metrics after subjecting each to a Z-score standardization. The second, ImmuneDEX (IDEX), is a weighted average that considers the correlation between Cell-IR and Ab-IR, as well as the difference in ranking of individuals by each metric, and uses these as weights in the averaging. Both simulation and real data were used to understand the behavior of ZMEAN and IDEX. To further ascertain the relationship between IDEX and other traits of economic importance, we evaluated a range of traits related to growth, feedlot performance, and carcass characteristics. We report estimates of heritability of 0.31 ± 0.06 for Cell-IR, 0.42 ± 0.06 for Ab-IR, 0.42 ± 0.06 for ZMEAN and 0.370 ± 0.06 for IDEX, as well as a unity genetic correlation (rg) between ZMEAN and IDEX. While a moderately positive rg was estimated between Cell-IR and Ab-IR (rg = 0.33 ± 0.12), strongly positive estimates were obtained between IDEX and Cell-IR (rg = 0.80 ± 0.05) and between IDEX and Ab-IR (rg = 0.85 ± 0.04). We obtained a moderately negative rg between IC traits and growth including an rg = −0.38 ± 0.14 between IDEX and weaning weight, and negligible with carcass fat measurements, including an rg = −0.03 ± 0.12 between IDEX and marbling. Given that breeding with a sole focus on production might inadvertently increase susceptibility to disease and associated antibiotic use, our analyses suggest that ImmuneDEX will provide a basis to breed animals that are both highly productive and with an enhanced ability to resist disease.

Download Full-text

Predicting the Brexit Vote by Tracking and Classifying Public Opinion Using Twitter Data

Statistics Politics and Policy ◽

10.1515/spp-2017-0006 ◽

2017 ◽

Vol 8 (1) ◽

Cited By ~ 4

Author(s):

Julio Cesar Amador Diaz Lopez ◽

Sofia Collignon-Delmar ◽

Kenneth Benoit ◽

Akitaka Matsuo

Keyword(s):

Public Opinion ◽

Training Set ◽

Twitter Data ◽

Eu Referendum ◽

The Uk ◽

High Level ◽

The Eu ◽

Training Sets

AbstractWe use 23M Tweets related to the EU referendum in the UK to predict the Brexit vote. In particular, we use user-generated labels known as hashtags to build training sets related to the Leave/Remain campaign. Next, we train SVMs in order to classify Tweets. Finally, we compare our results to Internet and telephone polls. This approach not only allows to reduce the time of hand-coding data to create a training set, but also achieves high level of correlations with Internet polls. Our results suggest that Twitter data may be a suitable substitute for Internet polls and may be a useful complement for telephone polls. We also discuss the reach and limitations of this method.

Download Full-text

KNOWLEDGE TRANSFER IN DEEP CONVOLUTIONAL NEURAL NETS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008004059 ◽

2008 ◽

Vol 17 (03) ◽

pp. 555-567 ◽

Cited By ~ 14

Author(s):

STEVEN GUTSTEIN ◽

OLAC FUENTES ◽

ERIC FREUDENTHAL

Keyword(s):

Knowledge Transfer ◽

Neural Nets ◽

Neural Net ◽

Training Set ◽

Primary Mechanism ◽

Clear Advantage ◽

New Concepts ◽

Initial Results ◽

Training Sets

Knowledge transfer is widely held to be a primary mechanism that enables humans to quickly learn new complex concepts when given only small training sets. In this paper, we apply knowledge transfer to deep convolutional neural nets, which we argue are particularly well suited for knowledge transfer. Our initial results demonstrate that components of a trained deep convolutional neural net can constructively transfer information to another such net. Furthermore, this transfer is completed in such a way that one can envision creating a net that could learn new concepts throughout its lifetime. The experiments we performed involved training a Deep Convolutional Neural Net (DCNN) on a large training set containing 20 different classes of handwritten characters from the NIST Special Database 19. This net was then used as a foundation for training a new net on a set of 20 different character classes from the NIST Special Database 19. The new net would keep the bottom layers of the old net (i.e. those nearest to the input) and only allow the top layers to train on the new character classes. We purposely used small training sets for the new net to force it to rely as much as possible upon transferred knowledge as opposed to a large and varied training set to learn the new set of hand written characters. Our results show a clear advantage in relying upon transferred knowledge to learn new tasks when given small training sets, if the new tasks are sufficiently similar to the previously mastered one. However, this advantage decreases as training sets increase in size.

Download Full-text

The APP procedure for estimating the Cohen's effect size

Asian Journal of Economics and Banking ◽

10.1108/ajeb-08-2021-0095 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xiangfei Chen ◽

David Trafimow ◽

Tonghui Wang ◽

Tingting Tong ◽

Cong Wang

Keyword(s):

Computer Simulations ◽

Effect Size ◽

A Priori ◽

Real Data ◽

Computer Programs ◽

Data Sets ◽

Population Parameter ◽

Sample Sizes ◽

Content Type ◽

User Friendly

PurposeThe authors derive the necessary mathematics, provide computer simulations, provide links to free and user-friendly computer programs, and analyze real data sets.Design/methodology/approachCohen's d, which indexes the difference in means in standard deviation units, is the most popular effect size measure in the social sciences and economics. Not surprisingly, researchers have developed statistical procedures for estimating sample sizes needed to have a desirable probability of rejecting the null hypothesis given assumed values for Cohen's d, or for estimating sample sizes needed to have a desirable probability of obtaining a confidence interval of a specified width. However, for researchers interested in using the sample Cohen's d to estimate the population value, these are insufficient. Therefore, it would be useful to have a procedure for obtaining sample sizes needed to be confident that the sample. Cohen's d to be obtained is close to the population parameter the researcher wishes to estimate, an expansion of the a priori procedure (APP). The authors derive the necessary mathematics, provide computer simulations and links to free and user-friendly computer programs, and analyze real data sets for illustration of our main results.FindingsIn this paper, the authors answered the following two questions: The precision question: How close do I want my sample Cohen's d to be to the population value? The confidence question: What probability do I want to have of being within the specified distance?Originality/valueTo the best of the authors’ knowledge, this is the first paper for estimating Cohen's effect size, using the APP method. It is convenient for researchers and practitioners to use the online computing packages.

Download Full-text