Learning from Scarce Information: Using Synthetic Data to Classify Roman Fine Ware Pottery

Santos J. Núñez Jareño; Daniël P. van Helden; Evgeny M. Mirkes; Ivan Y. Tyukin; Penelope M. Allison

doi:10.3390/e23091140

Learning from Scarce Information: Using Synthetic Data to Classify Roman Fine Ware Pottery

Entropy ◽

10.3390/e23091140 ◽

2021 ◽

Vol 23 (9) ◽

pp. 1140

Author(s):

Santos J. Núñez Jareño ◽

Daniël P. van Helden ◽

Evgeny M. Mirkes ◽

Ivan Y. Tyukin ◽

Penelope M. Allison

Keyword(s):

Expert Knowledge ◽

Hybrid Approach ◽

Synthetic Data ◽

Original Data ◽

Initial Training ◽

Training Set ◽

Data Generator ◽

The Impact ◽

Learning Architectures ◽

Better Than

In this article, we consider a version of the challenging problem of learning from datasets whose size is too limited to allow generalisation beyond the training set. To address the challenge, we propose to use a transfer learning approach whereby the model is first trained on a synthetic dataset replicating features of the original objects. In this study, the objects were smartphone photographs of near-complete Roman terra sigillata pottery vessels from the collection of the Museum of London. Taking the replicated features from published profile drawings of pottery forms allowed the integration of expert knowledge into the process through our synthetic data generator. After this first initial training the model was fine-tuned with data from photographs of real vessels. We show, through exhaustive experiments across several popular deep learning architectures, different test priors, and considering the impact of the photograph viewpoint and excessive damage to the vessels, that the proposed hybrid approach enables the creation of classifiers with appropriate generalisation performance. This performance is significantly better than that of classifiers trained exclusively on the original data, which shows the promise of the approach to alleviate the fundamental issue of learning from small datasets.

Download Full-text

Incorporating phylogenetic information in microbiome abundance studies has no effect on detection power and FDR control

10.1101/2020.01.31.928309 ◽

2020 ◽

Author(s):

Antoine Bichat ◽

Jonathan Plassais ◽

Christophe Ambroise ◽

Mahendra Mariadassou

Keyword(s):

Detection Rate ◽

A Priori ◽

Synthetic Data ◽

Evolutionary Information ◽

Differential Analysis ◽

Detection Rates ◽

False Discovery ◽

Bh Procedure ◽

The Impact ◽

Better Than

AbstractWe consider the problem of incorporating evolutionary information (e.g. taxonomic or phylogenic trees) in the context of metagenomics differential analysis. Recent results published in the literature propose different ways to leverage the tree structure to increase the detection rate of differentially abundant taxa. Here, we propose instead to use a different hierachical structure, in the form of a correlation-based tree, as it may capture the structure of the data better than the phylogeny. We first show that the correlation tree and the phylogeny are significantly different before turning to the impact of tree choice on detection rates. Using synthetic data, we show that the tree does have an impact: smoothing p-values according to the phylogeny leads to equal or inferior rates as smoothing according to the correlation tree. However, both trees are outperformed by the classical, non hierachical, Benjamini-Hochberg (BH) procedure in terms of detection rates. Other procedures may use the hierachical structure with profit but do not control the False Discovery Rate (FDR) a priori and remain inferior to a classical Benjamini-Hochberg procedure with the same nominal FDR. On real datasets, no hierarchical procedure had significantly higher detection rate that BH. Although intuition advocates the use of a hierachical structure, be it the phylogeny or the correlation tree, to increase the detection rate in microbiome studies, current hierachical procedures are still inferior to non hierachical ones and effective procedures remain to be invented.

Download Full-text

Advertising Media Impact in Consumer Buying Behavior

Journal of Balkumari College ◽

10.3126/jbkc.v8i0.29310 ◽

2019 ◽

Vol 8 ◽

pp. 54-56

Author(s):

Ashmita Dahal Chhetri

Keyword(s):

Consumer Behavior ◽

Young Male ◽

Electronic Media ◽

Buying Behavior ◽

Media Impact ◽

High Degree ◽

The Impact ◽

Buying Behaviors ◽

The Relationship ◽

Better Than

Advertisements have been used for many years to influence the buying behaviors of the consumers. Advertisements are helpful in creating the awareness and perception among the customers of a product. This particular research was conducted on the 100 young male and female who use different brands of product to check the influence of advertisement on their buying behavior while creating the awareness and building the perceptions. Correlation, regression and other statistical tools were used to identify the relationship between these variables. The results revealed that the relationship between media and consumer behavior is positive. The adve1tising impact on sales and there is positive and high degree relationship between advertising and consumer behavior. The impact on advertising of a product of electronic media is better than non-electronic media.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

Time-to-Treatment in Oral Cancer: Causes and Implications for Survival

Cancers ◽

10.3390/cancers13061321 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1321

Author(s):

Constanza Saka-Herrán ◽

Enric Jané-Salas ◽

Antoni Mari-Roig ◽

Albert Estrugo-Devesa ◽

José López-López

Keyword(s):

Oral Cancer ◽

Previous Treatment ◽

Diagnostic Procedures ◽

Original Data ◽

Primary Treatment ◽

Medical Attention ◽

Time Interval ◽

Pre Treatment ◽

The Impact ◽

Lack Of Knowledge

The purpose of this review was to identify and describe the causes that influence the time-intervals in the pathway of diagnosis and treatment of oral cancer and to assess its impact on prognosis and survival. The review was structured according to the recommendations of the Aarhus statement, considering original data from individual studies and systematic reviews that reported outcomes related to the patient, diagnostic and pre-treatment intervals. The patient interval is the major contributor to the total time-interval. Unawareness of signs and/or symptoms, denial and lack of knowledge about oral cancer are the major contributors to the process of seeking medical attention. The diagnostic interval is influenced by tumor factors, delays in referral due to higher number of consultations and previous treatment with different medicines or dental procedures and by professional factors such as experience and lack of knowledge related to the disease and diagnostic procedures. Patients with advanced stage disease, primary treatment with radiotherapy, treatment at an academic facility and transitions in care are associated with prolonged pre-treatment intervals. An emerging body of evidence supports the impact of prolonged pre-treatment and treatment intervals with poorer survival from oral cancer.

Download Full-text

Are e-learning Webinars the future of medical education? An exploratory study of a disruptive innovation in the COVID-19 era

Cardiology in the Young ◽

10.1017/s1047951120004503 ◽

2020 ◽

pp. 1-10

Author(s):

Colin J. McMahon ◽

Justin T. Tretter ◽

Theresa Faulkner ◽

R. Krishna Kumar ◽

Andrew N. Redington ◽

...

Keyword(s):

Deep Learning ◽

Carbon Footprint ◽

Quantitative Research ◽

Survey Design ◽

Hybrid Approach ◽

Disruptive Innovation ◽

Cross Sectional Survey ◽

Cross Sectional ◽

E Learning ◽

The Impact

Abstract Objective: This study investigated the impact of the Webinar on deep human learning of CHD. Materials and methods: This cross-sectional survey design study used an open and closed-ended questionnaire to assess the impact of the Webinar on deep learning of topical areas within the management of the post-operative tetralogy of Fallot patients. This was a quantitative research methodology using descriptive statistical analyses with a sequential explanatory design. Results: One thousand-three-hundred and seventy-four participants from 100 countries on 6 continents joined the Webinar, 557 (40%) of whom completed the questionnaire. Over 70% of participants reported that they “agreed” or “strongly agreed” that the Webinar format promoted deep learning for each of the topics compared to other standard learning methods (textbook and journal learning). Two-thirds expressed a preference for attending a Webinar rather than an international conference. Over 80% of participants highlighted significant barriers to attending conferences including cost (79%), distance to travel (49%), time commitment (51%), and family commitments (35%). Strengths of the Webinar included expertise, concise high-quality presentations often discussing contentious issues, and the platform quality. The main weakness was a limited time for questions. Just over 53% expressed a concern for the carbon footprint involved in attending conferences and preferred to attend a Webinar. Conclusion: E-learning Webinars represent a disruptive innovation, which promotes deep learning, greater multidisciplinary participation, and greater attendee satisfaction with fewer barriers to participation. Although Webinars will never fully replace conferences, a hybrid approach may reduce the need for conferencing, reduce carbon footprint. and promote a “sustainable academia”.

Download Full-text

Droplet impact onto a spring-supported plate: analysis and simulations

Journal of Engineering Mathematics ◽

10.1007/s10665-021-10107-5 ◽

2021 ◽

Vol 128 (1) ◽

Author(s):

Michael J. Negus ◽

Matthew R. Moore ◽

James M. Oliver ◽

Radu Cimpeanu

Keyword(s):

High Speed ◽

Flexible Substrate ◽

Practical Importance ◽

Hybrid Approach ◽

Hydrodynamic Pressure ◽

Analytical Framework ◽

Canonical System ◽

Linear Process ◽

Plate Motion ◽

The Impact

AbstractThe high-speed impact of a droplet onto a flexible substrate is a highly non-linear process of practical importance, which poses formidable modelling challenges in the context of fluid–structure interaction. We present two approaches aimed at investigating the canonical system of a droplet impacting onto a rigid plate supported by a spring and a dashpot: matched asymptotic expansions and direct numerical simulation (DNS). In the former, we derive a generalisation of inviscid Wagner theory to approximate the flow behaviour during the early stages of the impact. In the latter, we perform detailed DNS designed to validate the analytical framework, as well as provide insight into later times beyond the reach of the proposed analytical model. Drawing from both methods, we observe the strong influence that the mass of the plate, resistance of the dashpot, and stiffness of the spring have on the motion of the solid, which undergo forced damped oscillations. Furthermore, we examine how the plate motion affects the dynamics of the droplet, predominantly through altering its internal hydrodynamic pressure distribution. We build on the interplay between these techniques, demonstrating that a hybrid approach leads to improved model and computational development, as well as result interpretation, across multiple length and time scales.

Download Full-text

Locking down the Impact of New Zealand’s COVID-19 Alert Level Changes on Pets

Animals ◽

10.3390/ani11030758 ◽

2021 ◽

Vol 11 (3) ◽

pp. 758

Author(s):

Fiona Esam ◽

Rachel Forrest ◽

Natalie Waran

Keyword(s):

New Zealand ◽

Separation Anxiety ◽

National Surveys ◽

Normal Life ◽

Pet Owners ◽

Level 1 ◽

The Impact ◽

Better Than

The influence of the COVID-19 pandemic on human-pet interactions within New Zealand, particularly during lockdown, was investigated via two national surveys. In Survey 1, pet owners (n = 686) responded during the final week of the five-week Alert Level 4 lockdown (highest level of restrictions—April 2020), and survey 2 involved 498 respondents during July 2020 whilst at Alert Level 1 (lowest level of restrictions). During the lockdown, 54.7% of owners felt that their pets’ wellbeing was better than usual, while only 7.4% felt that it was worse. Most respondents (84.0%) could list at least one benefit of lockdown for their pets, and they noted pets were engaged with more play (61.7%) and exercise (49.7%) than pre-lockdown. Many respondents (40.3%) expressed that they were concerned about their pet’s wellbeing after lockdown, with pets missing company/attention and separation anxiety being major themes. In Survey 2, 27.9% of respondents reported that they continued to engage in increased rates of play with their pets after lockdown, however, the higher levels of pet exercise were not maintained. Just over one-third (35.9%) of owners took steps to prepare their pets to transition out of lockdown. The results indicate that pets may have enjoyed improved welfare during lockdown due to the possibility of increased human-pet interaction. The steps taken by owners to prepare animals for a return to normal life may enhance pet wellbeing long-term if maintained.

Download Full-text

A Siamese neural network model for the prioritization of metabolic disorders by integrating real and simulated data

Bioinformatics ◽

10.1093/bioinformatics/btaa841 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i787-i794

Author(s):

Gian Marco Messa ◽

Francesco Napolitano ◽

Sarah H. Elsea ◽

Diego di Bernardo ◽

Xin Gao

Keyword(s):

Predictive Accuracy ◽

Hybrid Approach ◽

Simulated Data ◽

Original Data ◽

Diagnostic Process ◽

Great Promise ◽

Metabolic Profiles ◽

Inborn Errors ◽

Metabolomic Data ◽

Near Future

Abstract Motivation Untargeted metabolomic approaches hold a great promise as a diagnostic tool for inborn errors of metabolisms (IEMs) in the near future. However, the complexity of the involved data makes its application difficult and time consuming. Computational approaches, such as metabolic network simulations and machine learning, could significantly help to exploit metabolomic data to aid the diagnostic process. While the former suffers from limited predictive accuracy, the latter is normally able to generalize only to IEMs for which sufficient data are available. Here, we propose a hybrid approach that exploits the best of both worlds by building a mapping between simulated and real metabolic data through a novel method based on Siamese neural networks (SNN). Results The proposed SNN model is able to perform disease prioritization for the metabolic profiles of IEM patients even for diseases that it was not trained to identify. To the best of our knowledge, this has not been attempted before. The developed model is able to significantly outperform a baseline model that relies on metabolic simulations only. The prioritization performances demonstrate the feasibility of the method, suggesting that the integration of metabolic models and data could significantly aid the IEM diagnosis process in the near future. Availability and implementation Metabolic datasets used in this study are publicly available from the cited sources. The original data produced in this study, including the trained models and the simulated metabolic profiles, are also publicly available (Messa et al., 2020).

Download Full-text

Adversarial Data Augmentation on Breast MRI Segmentation

Applied Sciences ◽

10.3390/app11104554 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4554

Author(s):

João F. Teixeira ◽

Mariana Dias ◽

Eva Batista ◽

Joana Costa ◽

Luís F. Teixeira ◽

...

Keyword(s):

Data Augmentation ◽

Medical Image Analysis ◽

Synthetic Data ◽

Breast Mri ◽

Semantic Segmentation ◽

Adversarial Networks ◽

Augmentation Strategy ◽

Magnetic Resonance Imaging Mri ◽

Segmentation Task ◽

The Impact

The scarcity of balanced and annotated datasets has been a recurring problem in medical image analysis. Several researchers have tried to fill this gap employing dataset synthesis with adversarial networks (GANs). Breast magnetic resonance imaging (MRI) provides complex, texture-rich medical images, with the same annotation shortage issues, for which, to the best of our knowledge, no previous work tried synthesizing data. Within this context, our work addresses the problem of synthesizing breast MRI images from corresponding annotations and evaluate the impact of this data augmentation strategy on a semantic segmentation task. We explored variations of image-to-image translation using conditional GANs, namely fitting the generator’s architecture with residual blocks and experimenting with cycle consistency approaches. We studied the impact of these changes on visual verisimilarity and how an U-Net segmentation model is affected by the usage of synthetic data. We achieved sufficiently realistic-looking breast MRI images and maintained a stable segmentation score even when completely replacing the dataset with the synthetic set. Our results were promising, especially when concerning to Pix2PixHD and Residual CycleGAN architectures.

Download Full-text

Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation

Personal and Ubiquitous Computing ◽

10.1007/s00779-021-01605-5 ◽

2021 ◽

Author(s):

Gretel Liz De la Peña Sarracén ◽

Paolo Rosso

Keyword(s):

Language Model ◽

Hybrid Approach ◽

Attention Mechanism ◽

Graph Representation ◽

Keyword Extraction ◽

Eigenvector Centrality ◽

Model Learning ◽

Harmful Content ◽

The Impact ◽

Offensive Language

AbstractThe proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets.

Download Full-text