scholarly journals The Additive Input-Doubling Method Based on the SVR with Nonlinear Kernels: Small Data Approach

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 612
Author(s):  
Ivan Izonin ◽  
Roman Tkachenko ◽  
Nataliya Shakhovska ◽  
Nataliia Lotoshynska

The problem of effective intellectual analysis in the case of handling short datasets is topical in various application areas. Such problems arise in medicine, economics, materials science, science, etc. This paper deals with a new additive input-doubling method designed by the authors for processing short and very short datasets. The main steps of the method should include the procedure of data augmentation within the existing dataset both in rows and columns (without training), the use of nonlinear SVR to implement the training procedure, and the formation of the result based on the author’s procedure. The authors show that the developed data augmentation procedure corresponds to the principles of axial symmetry. The training and application procedures of the method developed are described in detail, and two algorithmic implementations are presented. The optimal parameters of the method operation were selected experimentally. The efficiency of its work during the processing of short datasets for solving the prediction task was established experimentally by comparison with other methods of this class. The highest prediction accuracy based on both proposed algorithmic implementations of a method among all of the investigated ones was defined. The main areas of application of the developed method are described, and its shortcomings and prospects of further research are given.

2021 ◽  
Vol 11 (1) ◽  
pp. 28
Author(s):  
Ivan Lorencin ◽  
Sandi Baressi Šegota ◽  
Nikola Anđelić ◽  
Anđela Blagojević ◽  
Tijana Šušteršić ◽  
...  

COVID-19 represents one of the greatest challenges in modern history. Its impact is most noticeable in the health care system, mostly due to the accelerated and increased influx of patients with a more severe clinical picture. These facts are increasing the pressure on health systems. For this reason, the aim is to automate the process of diagnosis and treatment. The research presented in this article conducted an examination of the possibility of classifying the clinical picture of a patient using X-ray images and convolutional neural networks. The research was conducted on the dataset of 185 images that consists of four classes. Due to a lower amount of images, a data augmentation procedure was performed. In order to define the CNN architecture with highest classification performances, multiple CNNs were designed. Results show that the best classification performances can be achieved if ResNet152 is used. This CNN has achieved AUCmacro¯ and AUCmicro¯ up to 0.94, suggesting the possibility of applying CNN to the classification of the clinical picture of COVID-19 patients using an X-ray image of the lungs. When higher layers are frozen during the training procedure, higher AUCmacro¯ and AUCmicro¯ values are achieved. If ResNet152 is utilized, AUCmacro¯ and AUCmicro¯ values up to 0.96 are achieved if all layers except the last 12 are frozen during the training procedure.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Fahime Khozeimeh ◽  
Danial Sharifrazi ◽  
Navid Hoseini Izadi ◽  
Javad Hassannataj Joloudari ◽  
Afshin Shoeibi ◽  
...  

AbstractCOVID-19 has caused many deaths worldwide. The automation of the diagnosis of this virus is highly desired. Convolutional neural networks (CNNs) have shown outstanding classification performance on image datasets. To date, it appears that COVID computer-aided diagnosis systems based on CNNs and clinical information have not yet been analysed or explored. We propose a novel method, named the CNN-AE, to predict the survival chance of COVID-19 patients using a CNN trained with clinical information. Notably, the required resources to prepare CT images are expensive and limited compared to those required to collect clinical data, such as blood pressure, liver disease, etc. We evaluated our method using a publicly available clinical dataset that we collected. The dataset properties were carefully analysed to extract important features and compute the correlations of features. A data augmentation procedure based on autoencoders (AEs) was proposed to balance the dataset. The experimental results revealed that the average accuracy of the CNN-AE (96.05%) was higher than that of the CNN (92.49%). To demonstrate the generality of our augmentation method, we trained some existing mortality risk prediction methods on our dataset (with and without data augmentation) and compared their performances. We also evaluated our method using another dataset for further generality verification. To show that clinical data can be used for COVID-19 survival chance prediction, the CNN-AE was compared with multiple pre-trained deep models that were tuned based on CT images.


2022 ◽  
Author(s):  
Yuri Haraguchi ◽  
Yasuhiko Igarashi ◽  
Hiroaki Imai ◽  
Yuya Oaki

Data-scientific approaches have permeated in chemistry and materials science. In general, these approaches are not easily applied to small data, such as experimental data in laboratories. Our group has focused...


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Vishu Gupta ◽  
Kamal Choudhary ◽  
Francesca Tavazza ◽  
Carelyn Campbell ◽  
Wei-keng Liao ◽  
...  

AbstractArtificial intelligence (AI) and machine learning (ML) have been increasingly used in materials science to build predictive models and accelerate discovery. For selected properties, availability of large databases has also facilitated application of deep learning (DL) and transfer learning (TL). However, unavailability of large datasets for a majority of properties prohibits widespread application of DL/TL. We present a cross-property deep-transfer-learning framework that leverages models trained on large datasets to build models on small datasets of different properties. We test the proposed framework on 39 computational and two experimental datasets and find that the TL models with only elemental fractions as input outperform ML/DL models trained from scratch even when they are allowed to use physical attributes as input, for 27/39 (≈ 69%) computational and both the experimental datasets. We believe that the proposed framework can be widely useful to tackle the small data challenge in applying AI/ML in materials science.


Materials ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 2561 ◽  
Author(s):  
Christian Grech ◽  
Marco Buzio ◽  
Mariano Pentella ◽  
Nicholas Sammut

In this work, a Preisach-recurrent neural network model is proposed to predict the dynamic hysteresis in ARMCO pure iron, an important soft magnetic material in particle accelerator magnets. A recurrent neural network coupled with Preisach play operators is proposed, along with a novel validation method for the identification of the model’s parameters. The proposed model is found to predict the magnetic flux density of ARMCO pure iron with a Normalised Root Mean Square Error (NRMSE) better than 0.7%, when trained with just six different hysteresis loops. The model is evaluated using ramp-rates not used in the training procedure, which shows the ability of the model to predict data which has not been measured. The results demonstrate that the Preisach model based on a recurrent neural network can accurately describe ferromagnetic dynamic hysteresis when trained with a limited amount of data, showing the model’s potential in the field of materials science.


2019 ◽  
Author(s):  
Virgı́nia F. Mota ◽  
Jefersson A. dos Santos ◽  
Arnaldo De A. Araújo

Spatiotemporal description is a research field with applications in various areas such as video indexing, surveillance, human-computer interfaces, among others. Big Data problems in large databases are now being treated with Deep Learning tools, however we still have room for improvement in spatiotemporal handcraft description. Moreover, we still have problems that involve small data in which data augmentation and other techniques are not valid. The main contribution of this Ph.D. Thesis 1 is the development of a framework for spatiotemporal representation using orientation tensors enabling dimension reduction and invariance. This is a multipurpose framework called Features As Spatiotemporal Tensors (FASTensor). We evaluate this framework in three different applications: Human Action recognition, Video Pornography classification and Cancer Cell classification. The latter one is also a contribution of this work, since we introduce a new dataset called Melanoma Cancer Cell dataset (MCC). It is a small data that cannot be artificially augmented due the difficulty of extraction and the nature of motion. The results were competitive, while also being fast and simple to implement. Finally, our results in the MCC dataset can be used in other cancer cell treatment analysis.


2020 ◽  
Vol 10 (7) ◽  
pp. 1494-1505
Author(s):  
Hyo-Hun Kim ◽  
Byung-Woo Hong

In this work, we present an image segmentation algorithm based on the convolutional neural network framework where the scale space theory is incorporated in the course of training procedure. The construction of data augmentation is designed to apply the scale space to the training data in order to effectively deal with the variability of regions of interest in geometry and appearance such as shape and contrast. The proposed data augmentation algorithm via scale space is aimed to improve invariant features with respect to both geometry and appearance by taking into consideration of their diffusion process. We develop a segmentation algorithm based on the convolutional neural network framework where the network architecture consists of encoding and decoding substructures in combination with the data augmentation scheme via the scale space induced by the heat equation. The quantitative analysis using the cardiac MRI dataset indicates that the proposed algorithm achieves better accuracy in the delineation of the left ventricles, which demonstrates the potential of the algorithm in the application of the whole heart segmentation as a compute-aided diagnosis system for the cardiac diseases.


2021 ◽  
Author(s):  
Shidang Xu ◽  
Jiali Li ◽  
Pengfei Cai ◽  
Xiaoli Liu ◽  
Bin Liu ◽  
...  

Artificial intelligence (AI) based self-learning or self-improving material discovery system is the holy grail of next-generation material discovery and materials science. Herein, we demonstrate how to combine accurate prediction of material performance via quantum chemical calculations and Bayesian optimization-based active learning to realize a self-improving discovery system for high-performance photosensitizers (PS). Through self-improving cycles, such a system can improve the model prediction accuracy (best mean average error of 0.09 eV for singlet-triplet spitting) and high-performance PS search ability, realizing the efficient discovery of PS. From a molecular space with more than 7 million molecules, 5950 potential high-performance PSs were discovered.


Author(s):  
Abhishek Singh ◽  
Debojyoti Dutta ◽  
Amit Saha

Majority of the advancement in Deep learning (DL) has occurred in domains such as computer vision, and natural language processing, where abundant training data is available. A major obstacle in leveraging DL techniques for malware analysis is the lack of sufficiently big, labeled datasets. In this paper, we take the first steps towards building a model which can synthesize labeled dataset of malware images using GAN. Such a model can be utilized to perform data augmentation for training a classifier. Furthermore, the model can be shared publicly for community to reap benefits of dataset without sharing the original dataset. First, we show the underlying idiosyncrasies of malware images and why existing data augmentation techniques as well as traditional GAN training fail to produce quality artificial samples. Next, we propose a new method for training GAN where we explicitly embed prior domain knowledge about the dataset into the training procedure. We show improvements in training stability and sample quality assessed on different metrics. Our experiments show substantial improvement on baselines and promise for using such a generative model for malware visualization systems.


2020 ◽  
Vol 8 (1) ◽  
Author(s):  
Layne Bradshaw ◽  
Rashmish K. Mishra ◽  
Andrea Mitridate ◽  
Bryan Ostdiek

Searching for new physics in large data sets needs a balance between two competing effects—signal identification vs background distortion. In this work, we perform a systematic study of both single variable and multivariate jet tagging methods that aim for this balance. The methods preserve the shape of the background distribution by either augmenting the training procedure or the data itself. Multiple quantitative metrics to compare the methods are considered, for tagging 2-, 3-, or 4-prong jets from the QCD background. This is the first study to show that the data augmentation techniques of Planing and PCA based scaling deliver similar performance as the augmented training techniques of Adversarial NN and uBoost, but are both easier to implement and computationally cheaper.


Sign in / Sign up

Export Citation Format

Share Document