Machine learning reveals multiple classes of diamond nanoparticles

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text

A Machine Learning Approach to Study Glycosidase Activities from Bifidobacterium

Microorganisms ◽

10.3390/microorganisms9051034 ◽

2021 ◽

Vol 9 (5) ◽

pp. 1034

Author(s):

Carlos Sabater ◽

Lorena Ruiz ◽

Abelardo Margolles

Keyword(s):

Machine Learning ◽

Supervised Classification ◽

Machine Learning Algorithms ◽

Learning Approach ◽

Human Milk Oligosaccharides ◽

Future Studies ◽

High Fiber ◽

Machine Learning Approach ◽

Prebiotic Oligosaccharides

This study aimed to recover metagenome-assembled genomes (MAGs) from human fecal samples to characterize the glycosidase profiles of Bifidobacterium species exposed to different prebiotic oligosaccharides (galacto-oligosaccharides, fructo-oligosaccharides and human milk oligosaccharides, HMOs) as well as high-fiber diets. A total of 1806 MAGs were recovered from 487 infant and adult metagenomes. Unsupervised and supervised classification of glycosidases codified in MAGs using machine-learning algorithms allowed establishing characteristic hydrolytic profiles for B. adolescentis, B. bifidum, B. breve, B. longum and B. pseudocatenulatum, yielding classification rates above 90%. Glycosidase families GH5 44, GH32, and GH110 were characteristic of B. bifidum. The presence or absence of GH1, GH2, GH5 and GH20 was characteristic of B. adolescentis, B. breve and B. pseudocatenulatum, while families GH1 and GH30 were relevant in MAGs from B. longum. These characteristic profiles allowed discriminating bifidobacteria regardless of prebiotic exposure. Correlation analysis of glycosidase activities suggests strong associations between glycosidase families comprising HMOs-degrading enzymes, which are often found in MAGs from the same species. Mathematical models here proposed may contribute to a better understanding of the carbohydrate metabolism of some common bifidobacteria species and could be extrapolated to other microorganisms of interest in future studies.

Download Full-text

Machine Learning for Brain Images Classification of Two Language Speakers

Computational Intelligence and Neuroscience ◽

10.1155/2020/9045456 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Alejandro-Israel Barranco-Gutiérrez

Keyword(s):

Machine Learning ◽

Magnetic Resonance Images ◽

Common Language ◽

Brain Images ◽

Hidden Layer ◽

Different Characteristics ◽

The Brain ◽

Relevant Work ◽

Complex Organ

The image analysis of the brain with machine learning continues to be a relevant work for the detection of different characteristics of this complex organ. Recent research has observed that there are differences in the structure of the brain, specifically in white matter, when learning and using a second language. This work focuses on knowing the brain from the classification of Magnetic Resonance Images (MRIs) of bilingual and monolingual people who have English as their common language. Different artificial neural networks of a hidden layer were tested until reaching two neurons in that layer. The number of entries used was nine hundred and the classifier registered a high percentage of effectiveness. The training was supervised which could be improved in a future investigation. This task is usually carried out by an expert human with Tract-Based Spatial Statistics analysis and fractional anisotropy expressed in different colors on a screen. So, this proposal presents another option to quantitatively analyse this type of phenomena which allows to contribute to neuroscience by automatically detecting bilingual people of monolinguals by using machine learning from MRIs. This reinforces what is reported in manual detections and the way that a machine can do it.

Download Full-text

Classification of specialty coffees using machine learning techniques

Research Society and Development ◽

10.33448/rsd-v10i5.14732 ◽

2021 ◽

Vol 10 (5) ◽

pp. e13110514732

Author(s):

Paulo César Ossani ◽

Diogo Francisco Rossoni ◽

Marcelo Ângelo Cirillo ◽

Flávio Meira Borém

Keyword(s):

Machine Learning ◽

Supervised Classification ◽

Consumer Acceptance ◽

Machine Learning Techniques ◽

Machine Learning Technique ◽

Learning Techniques ◽

Learning Technique ◽

Evaluation Test ◽

New Methodologies

Specialty coffees have a big importance in the economic scenario, and its sensory quality is appreciated by the productive sector and by the market. Researches have been constantly carried out in the search for better blends in order to add value and differentiate prices according to the product quality. To accomplish that, new methodologies must be explored, taking into consideration factors that might differentiate the particularities of each consumer and/or product. Thus, this article suggests the use of the machine learning technique in the construction of supervised classification and identification models. In a sensory evaluation test for consumer acceptance using four classes of specialty coffees, applied to four groups of trained and untrained consumers, features such as flavor, body, sweetness and general grade were evaluated. The use of machine learning is viable because it allows the classification and identification of specialty coffees produced in different altitudes and different processing methods.

Download Full-text

A Hybrid Machine Learning Model for Electricity Consumer Categorization Using Smart Meter Data

Energies ◽

10.3390/en11092235 ◽

2018 ◽

Vol 11 (9) ◽

pp. 2235 ◽

Cited By ~ 4

Author(s):

Zigui Jiang ◽

Rongheng Lin ◽

Fangchun Yang

Keyword(s):

Machine Learning ◽

Supervised Classification ◽

Electricity Consumption ◽

Learning Model ◽

Unsupervised Clustering ◽

Smart Meter ◽

Machine Learning Model ◽

Proposed Model ◽

Comparison Results ◽

Hybrid Machine

Time-series smart meter data can record precisely electricity consumption behaviors of every consumer in the smart grid system. A better understanding of consumption behaviors and an effective consumer categorization based on the similarity of these behaviors can be helpful for flexible demand management and effective energy control. In this paper, we propose a hybrid machine learning model including both unsupervised clustering and supervised classification for categorizing consumers based on the similarity of their typical electricity consumption behaviors. Unsupervised clustering algorithm is used to extract the typical electricity consumption behaviors and perform fuzzy consumer categorization, followed by a proposed novel algorithm to identify distinct consumer categories and their consumption characteristics. Supervised classification algorithm is used to classify new consumers and evaluate the validity of the identified categories. The proposed model is applied to a real dataset of U.S. non-residential consumers collected by smart meters over one year. The results indicate that large or special institutions usually have their distinct consumption characteristics while others such as some medium and small institutions or similar building types may have the same characteristics. Moreover, the comparison results with other methods show the improved performance of the proposed model in terms of category identification and classifying accuracy.

Download Full-text

Unsupervised clustering and spectral unmixing for feature extraction prior to supervised classification of hyperspectral images

10.1117/12.892469 ◽

2011 ◽

Cited By ~ 3

Author(s):

Inmaculada Dópido ◽

Alberto Villa ◽

Antonio Plaza

Keyword(s):

Feature Extraction ◽

Supervised Classification ◽

Unsupervised Clustering ◽

Spectral Unmixing ◽

Hyperspectral Images

Download Full-text

Auto-encoded Latent Representations of White Matter Streamlines for Quantitative Distance Analysis

10.1101/2021.10.06.463445 ◽

2021 ◽

Author(s):

Shenjun Zhong ◽

Zhaolin Chen ◽

Gary Egan

Keyword(s):

White Matter ◽

Supervised Classification ◽

Unsupervised Clustering ◽

Distance Analysis ◽

Brain White Matter ◽

Latent Space ◽

Generic Data ◽

Connectivity Patterns ◽

Latent Representations

Parcellation of whole brain tractogram is a critical step to study brain white matter structures and connectivity patterns. The existing methods based on supervised classification of streamlines into predefined streamline bundle types are not designed to explore sub-bundle structures, and methods with manually designed features are expensive to compute streamline-wise similarities. To resolve these issues, we proposed a novel atlas-free method that learnt a latent space using a deep recurrent autoencoder which efficiently embedded any lengths of streamlines to fixed-size feature vectors, namely, streamline embeddings, and enabled tractogram parcellation via unsupervised clustering in the latent space. The method is evaluated on the ISMRM 2015 tractography challenge dataset, and shows the ability to discriminate major bundles with unsupervised clustering and query streamline based on similarity. The learnt latent representations of streamlines and bundles also open the possibility of quantitatively studying any granularities of sub-bundle structures with generic data mining techniques.

Download Full-text

Machine Learning-Based Supervised Classification of Point Clouds Using Multiscale Geometric Features

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10030187 ◽

2021 ◽

Vol 10 (3) ◽

pp. 187

Author(s):

Muhammed Enes Atik ◽

Zaide Duran ◽

Dursun Zafer Seker

Keyword(s):

Machine Learning ◽

Point Cloud ◽

Supervised Classification ◽

Point Clouds ◽

Support Vector ◽

Geometric Features ◽

Mathematical Tool ◽

City Area ◽

3D Point Clouds

3D scene classification has become an important research field in photogrammetry, remote sensing, computer vision and robotics with the widespread usage of 3D point clouds. Point cloud classification, called semantic labeling, semantic segmentation, or semantic classification of point clouds is a challenging topic. Machine learning, on the other hand, is a powerful mathematical tool used to classify 3D point clouds whose content can be significantly complex. In this study, the classification performance of different machine learning algorithms in multiple scales was evaluated. The feature spaces of the points in the point cloud were created using the geometric features generated based on the eigenvalues of the covariance matrix. Eight supervised classification algorithms were tested in four different areas from three datasets (the Dublin City dataset, Vaihingen dataset and Oakland3D dataset). The algorithms were evaluated in terms of overall accuracy, precision, recall, F1 score and process time. The best overall results were obtained for four test areas with different algorithms. Dublin City Area 1 was obtained with Random Forest as 93.12%, Dublin City Area 2 was obtained with a Multilayer Perceptron algorithm as 92.78%, Vaihingen was obtained as 79.71% with Support Vector Machines and Oakland3D with Linear Discriminant Analysis as 97.30%.

Download Full-text

COMPARATIVE STUDY OF MACHINE LEARNING TECHNIQUES FOR SUPERVISED CLASSIFICATION OF BIOMEDICAL DATA

Acta Electrotechnica et Informatica ◽

10.15546/aeei-2014-0021 ◽

2014 ◽

Vol 14 (3) ◽

pp. 5-10 ◽

Cited By ~ 7

Author(s):

Peter DROTÁR ◽

Zdeněk SMÉKAL

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Supervised Classification ◽

Machine Learning Techniques ◽

Biomedical Data ◽

Learning Techniques

Download Full-text

An in-depth analysis of logarithmic data transformation and per-class normalization in machine learning: Application to unsupervised classification of a turbidite system in the Canterbury Basin, New Zealand and supervised classification of salt in the Eugene Island mini-basin, Gulf of Mexico

Interpretation ◽

10.1190/int-2021-0008.1 ◽

2021 ◽

pp. 1-109

Author(s):

Thang N. Ha ◽

David Lubo-Robles ◽

Kurt J. Marfurt ◽

Bradley C. Wallet

Keyword(s):

Machine Learning ◽

New Zealand ◽

Gulf Of Mexico ◽

Supervised Classification ◽

Data Transformation ◽

Unsupervised Classification ◽

Data Normalization ◽

Z Score ◽

Depth Analysis

In a machine learning workflow, data normalization is a crucial step that compensates for the large variation in data ranges and averages associated with different types of input measured with different units. However, most machine learning implementations do not provide data normalization beyond the z-score algorithm which subtracts the mean from the distribution and then scales the result by dividing by the standard deviation. Although z-score converts data with Gaussian behavior to have the same shape and size, many of our seismic attribute volumes exhibit log-normal, or even more complicated distributions. Because many machine learning applications are based on Gaussian statistics, we wish to evaluate the impact of more sophisticated data normalization techniques on the resulting classification. To do so, we provide an in-depth analysis of data normalization in machine-learning classifications by formulating and applying a logarithmic data transformation scheme to the unsupervised classifications (including PCA, ICA, SOM, and GTM) of a turbidite channel system in the Canterbury Basin, New Zealand, as well as implementing a per-class normalization scheme to the supervised probabilistic neural network (PNN) classification of salt in the Eugene Island mini-basin, Gulf of Mexico. Compared to the simple z-score normalization, a single logarithmic transformation applied to each input attribute significantly increases the spread of the resulting clusters (and corresponding color contrast), thereby enhancing subtle details in projection and unsupervised classification. However, this same uniform transformation produces less-confident results in supervised classification using probabilistic neural networks. We find that more accurate supervised classifications can be found by applying class-dependent normalization for each input attribute.

Download Full-text