Using state-of-the-art hydrodynamic models to generate synthetic data cubes for imaging spectral sensing applications

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

Gradient Profile Estimation Using Exponential Cubic Spline Smoothing in a Bayesian Framework

Entropy ◽

10.3390/e23060674 ◽

2021 ◽

Vol 23 (6) ◽

pp. 674

Author(s):

Kushani De De Silva ◽

Carlo Cafaro ◽

Adom Giffin

Keyword(s):

State Of The Art ◽

Estimation Method ◽

Synthetic Data ◽

Noisy Data ◽

Bayesian Framework ◽

Physical Systems ◽

Gradient Profile ◽

Ill Posed ◽

Profile Estimation ◽

Different Levels

Attaining reliable gradient profiles is of utmost relevance for many physical systems. In many situations, the estimation of the gradient is inaccurate due to noise. It is common practice to first estimate the underlying system and then compute the gradient profile by taking the subsequent analytic derivative of the estimated system. The underlying system is often estimated by fitting or smoothing the data using other techniques. Taking the subsequent analytic derivative of an estimated function can be ill-posed. This becomes worse as the noise in the system increases. As a result, the uncertainty generated in the gradient estimate increases. In this paper, a theoretical framework for a method to estimate the gradient profile of discrete noisy data is presented. The method was developed within a Bayesian framework. Comprehensive numerical experiments were conducted on synthetic data at different levels of noise. The accuracy of the proposed method was quantified. Our findings suggest that the proposed gradient profile estimation method outperforms the state-of-the-art methods.

Download Full-text

Estimating Efforts and Success of Symmetry-Seeing Machines by Use of Synthetic Data

Symmetry ◽

10.3390/sym11020227 ◽

2019 ◽

Vol 11 (2) ◽

pp. 227

Author(s):

Eckart Michaelsen ◽

Stéphane Vujasinovic

Keyword(s):

Extraction Method ◽

Input Data ◽

State Of The Art ◽

Recognition Performance ◽

Human Subjects ◽

Synthetic Data ◽

Ground Truth ◽

Comparative Test ◽

Real Imagery ◽

Dominant Part

Representative input data are a necessary requirement for the assessment of machine-vision systems. For symmetry-seeing machines in particular, such imagery should provide symmetries as well as asymmetric clutter. Moreover, there must be reliable ground truth with the data. It should be possible to estimate the recognition performance and the computational efforts by providing different grades of difficulty and complexity. Recent competitions used real imagery labeled by human subjects with appropriate ground truth. The paper at hand proposes to use synthetic data instead. Such data contain symmetry, clutter, and nothing else. This is preferable because interference with other perceptive capabilities, such as object recognition, or prior knowledge, can be avoided. The data are given sparsely, i.e., as sets of primitive objects. However, images can be generated from them, so that the same data can also be fed into machines requiring dense input, such as multilayered perceptrons. Sparse representations are preferred, because the author’s own system requires such data, and in this way, any influence of the primitive extraction method is excluded. The presented format allows hierarchies of symmetries. This is important because hierarchy constitutes a natural and dominant part in symmetry-seeing. The paper reports some experiments using the author’s Gestalt algebra system as symmetry-seeing machine. Additionally included is a comparative test run with the state-of-the-art symmetry-seeing deep learning convolutional perceptron of the PSU. The computational efforts and recognition performance are assessed.

Download Full-text

Improving Skin Lesion Analysis with Generative Adversarial Networks

10.5753/sibgrapi.est.2020.12986 ◽

2020 ◽

Author(s):

Alceu Bissoto ◽

Sandra Avila

Keyword(s):

Skin Lesion ◽

State Of The Art ◽

Synthetic Data ◽

Clinical Information ◽

Analysis Data ◽

Training Dataset ◽

Generative Adversarial Networks ◽

Classification Models ◽

Adversarial Networks ◽

Lesion Analysis

Melanoma is the most lethal type of skin cancer. Early diagnosis is crucial to increase the survival rate of those patients due to the possibility of metastasis. Automated skin lesion analysis can play an essential role by reaching people that do not have access to a specialist. However, since deep learning became the state-of-the-art for skin lesion analysis, data became a decisive factor in pushing the solutions further. The core objective of this M.Sc. dissertation is to tackle the problems that arise by having limited datasets. In the first part, we use generative adversarial networks to generate synthetic data to augment our classification model’s training datasets to boost performance. Our method generates high-resolution clinically-meaningful skin lesion images, that when compound our classification model’s training dataset, consistently improved the performance in different scenarios, for distinct datasets. We also investigate how our classification models perceived the synthetic samples and how they can aid the model’s generalization. Finally, we investigate a problem that usually arises by having few, relatively small datasets that are thoroughly re-used in the literature: bias. For this, we designed experiments to study how our models’ use data, verifying how it exploits correct (based on medical algorithms), and spurious (based on artifacts introduced during image acquisition) correlations. Disturbingly, even in the absence of any clinical information regarding the lesion being diagnosed, our classification models presented much better performance than chance (even competing with specialists benchmarks), highly suggesting inflated performances.

Download Full-text

Present State-of-The-Art of Association Rule Mining Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2202.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6398-6405

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

State Of The Art ◽

Synthetic Data ◽

Data Sets ◽

Evolutionary Analysis ◽

Rule Mining ◽

Transaction Database ◽

Mining Algorithms

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis

Download Full-text

AI-driven deep CNN approach for multi-label pathology classification using chest X-Rays

PeerJ Computer Science ◽

10.7717/peerj-cs.495 ◽

2021 ◽

Vol 7 ◽

pp. e495

Author(s):

Saleh Albahli ◽

Hafiz Tayyab Rauf ◽

Abdulelah Algosaibi ◽

Valentina Emilia Balas

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Synthetic Data ◽

X Rays ◽

Deep Convolutional Neural Networks ◽

Current State ◽

Pathology Classification ◽

Wide Range ◽

Multi Class Classification

Artificial intelligence (AI) has played a significant role in image analysis and feature extraction, applied to detect and diagnose a wide range of chest-related diseases. Although several researchers have used current state-of-the-art approaches and have produced impressive chest-related clinical outcomes, specific techniques may not contribute many advantages if one type of disease is detected without the rest being identified. Those who tried to identify multiple chest-related diseases were ineffective due to insufficient data and the available data not being balanced. This research provides a significant contribution to the healthcare industry and the research community by proposing a synthetic data augmentation in three deep Convolutional Neural Networks (CNNs) architectures for the detection of 14 chest-related diseases. The employed models are DenseNet121, InceptionResNetV2, and ResNet152V2; after training and validation, an average ROC-AUC score of 0.80 was obtained competitive as compared to the previous models that were trained for multi-class classification to detect anomalies in x-ray images. This research illustrates how the proposed model practices state-of-the-art deep neural networks to classify 14 chest-related diseases with better accuracy.

Download Full-text

Using control charts for on-line video summarisation

MATEC Web of Conferences ◽

10.1051/matecconf/201927701012 ◽

2019 ◽

Vol 277 ◽

pp. 01012 ◽

Cited By ~ 1

Author(s):

Clare E. Matthews ◽

Paria Yousefi ◽

Ludmila I. Kuncheva

Keyword(s):

Feature Extraction ◽

Control Chart ◽

Control Charts ◽

State Of The Art ◽

Synthetic Data ◽

New Method ◽

Data Sets ◽

On Line ◽

Memory Constraints ◽

Video Summarisation

Many existing methods for video summarisation are not suitable for on-line applications, where computational and memory constraints mean that feature extraction and frame selection must be simple and efficient. Our proposed method uses RGB moments to represent frames, and a control-chart procedure to identify shots from which keyframes are then selected. The new method produces summaries of higher quality than two state-of-the-art on-line video summarisation methods identified as the best among nine such methods in our previous study. The summary quality is measured against an objective ideal for synthetic data sets, and compared to user-generated summaries of real videos.

Download Full-text

Parameterized Nonlinear Least Squares for Unsupervised Nonlinear Spectral Unmixing

Remote Sensing ◽

10.3390/rs11020148 ◽

2019 ◽

Vol 11 (2) ◽

pp. 148 ◽

Cited By ~ 3

Author(s):

Risheng Huang ◽

Xiaorun Li ◽

Haiqiang Lu ◽

Jing Li ◽

Liaoying Zhao

Keyword(s):

Least Squares ◽

Optimization Problem ◽

Nonnegative Matrix Factorization ◽

Optimization Problems ◽

State Of The Art ◽

Synthetic Data ◽

Nonnegative Matrix ◽

Nonlinear Least Squares ◽

Hyperspectral Data ◽

Spectral Unmixing

This paper presents a new parameterized nonlinear least squares (PNLS) algorithm for unsupervised nonlinear spectral unmixing (UNSU). The PNLS-based algorithms transform the original optimization problem with respect to the endmembers, abundances, and nonlinearity coefficients estimation into separate alternate parameterized nonlinear least squares problems. Owing to the Sigmoid parameterization, the PNLS-based algorithms are able to thoroughly relax the additional nonnegative constraint and the nonnegative constraint in the original optimization problems, which facilitates finding a solution to the optimization problems . Subsequently, we propose to solve the PNLS problems based on the Gauss–Newton method. Compared to the existing nonnegative matrix factorization (NMF)-based algorithms for UNSU, the well-designed PNLS-based algorithms have faster convergence speed and better unmixing accuracy. To verify the performance of the proposed algorithms, the PNLS-based algorithms and other state-of-the-art algorithms are applied to synthetic data generated by the Fan model and the generalized bilinear model (GBM), as well as real hyperspectral data. The results demonstrate the superiority of the PNLS-based algorithms.

Download Full-text

Reconstructing neuronal circuitry from parallel spike trains

Nature Communications ◽

10.1038/s41467-019-12225-2 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 7

Author(s):

Ryota Kobayashi ◽

Shuhei Kurita ◽

Anno Kurth ◽

Katsunori Kitano ◽

Kenji Mizuseki ◽

...

Keyword(s):

Generalized Linear Model ◽

State Of The Art ◽

Synthetic Data ◽

Brain Regions ◽

Spike Trains ◽

Circuit Diagram ◽

Estimation Errors ◽

Neuronal Circuitry ◽

Large Numbers ◽

Cross Correlations

Abstract State-of-the-art techniques allow researchers to record large numbers of spike trains in parallel for many hours. With enough such data, we should be able to infer the connectivity among neurons. Here we develop a method for reconstructing neuronal circuitry by applying a generalized linear model (GLM) to spike cross-correlations. Our method estimates connections between neurons in units of postsynaptic potentials and the amount of spike recordings needed to verify connections. The performance of inference is optimized by counting the estimation errors using synthetic data. This method is superior to other established methods in correctly estimating connectivity. By applying our method to rat hippocampal data, we show that the types of estimated connections match the results inferred from other physiological cues. Thus our method provides the means to build a circuit diagram from recorded spike trains, thereby providing a basis for elucidating the differences in information processing in different brain regions.

Download Full-text

Semi-analytical models of hydroelastic sloshing impact in tanks of liquefied natural gas vessels

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2011.0112 ◽

2011 ◽

Vol 369 (1947) ◽

pp. 2920-2941 ◽

Cited By ~ 10

Author(s):

I. Ten ◽

Š. Malenica ◽

A. Korobkin

Keyword(s):

Natural Gas ◽

State Of The Art ◽

Liquefied Natural Gas ◽

Analytical Models ◽

Structural Behaviour ◽

Hydrodynamic Models ◽

Basic Principles ◽

Structural Part ◽

The Impact ◽

Violent Sloshing

The present paper deals with the methods for the evaluation of the hydroelastic interactions that appear during the violent sloshing impacts inside the tanks of liquefied natural gas carriers. The complexity of both the fluid flow and the structural behaviour (containment system and ship structure) does not allow for a fully consistent direct approach according to the present state of the art. Several simplifications are thus necessary in order to isolate the most dominant physical aspects and to treat them properly. In this paper, choice was made of semi-analytical modelling for the hydrodynamic part and finite-element modelling for the structural part. Depending on the impact type, different hydrodynamic models are proposed, and the basic principles of hydroelastic coupling are clearly described and validated with respect to the accuracy and convergence of the numerical results.

Download Full-text