Wineinformatics: Regression on the Grade and Price of Wines through Their Sensory Attributes

James Palmer; Bernard Chen

doi:10.3390/fermentation4040084

Wineinformatics: Regression on the Grade and Price of Wines through Their Sensory Attributes

Fermentation ◽

10.3390/fermentation4040084 ◽

2018 ◽

Vol 4 (4) ◽

pp. 84 ◽

Cited By ~ 3

Author(s):

James Palmer ◽

Bernard Chen

Keyword(s):

Regression Models ◽

Large Scale ◽

Support Vector ◽

Prior Work ◽

Point Scale ◽

Large Dataset ◽

Advantages And Disadvantages ◽

Large Scale Dataset ◽

Actual Grade ◽

Classification And Regression

Wineinformatics is a field that uses machine-learning and data-mining techniques to glean useful information from wine. In this work, attributes extracted from a large dataset of over 100,000 wine reviews are used to make predictions on two variables: quality based on a “100-point scale”, and price per 750 mL bottle. These predictions were built using support vector regression. Several evaluation metrics were used for model evaluation. In addition, these regression models were compared to classification accuracies achieved in a prior work. When regression was used for classification, the results were somewhat poor; however, this was expected since the main purpose of the regression was not to classify the wines. Therefore, this paper also compares the advantages and disadvantages of both classification and regression. Regression models can successfully predict within a few points of the correct grade of a wine. On average, the model was only 1.6 points away from the actual grade and off by about $13 per bottle of wine. To the best of our knowledge, this is the first work to use a large-scale dataset of wine reviews to perform regression predictions on grade and price.

Download Full-text

Classification and Regression Models for Genomic Selection of Skewed Phenotypes: A Case for Disease Resistance in Winter Wheat (Triticum aestivum L.)

10.1101/2021.12.16.472985 ◽

2021 ◽

Author(s):

Lance F Merrick ◽

Dennis N Lozada ◽

Xianming Chen ◽

Arron H Carter

Keyword(s):

Support Vector Machine ◽

Winter Wheat ◽

Genomic Selection ◽

Stripe Rust ◽

Regression Models ◽

Prediction Models ◽

Support Vector ◽

Classification Models ◽

Breeding Lines ◽

Classification And Regression

Most genomic prediction models are linear regression models that assume continuous and normally distributed phenotypes, but responses to diseases such as stripe rust (caused by Puccinia striiformis f. sp. tritici) are commonly recorded in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) data in germplasm screening nurseries generally do not follow these assumptions. On this regard, researchers may ignore the lack of normality, transform the phenotypes, use generalized linear models, or use supervised learning algorithms and classification models with no restriction on the distribution of response variables, which are less sensitive when modeling ordinal scores. The goal of this research was to compare classification and regression genomic selection models for skewed phenotypes using stripe rust SEV and IT in winter wheat. We extensively compared both regression and classification prediction models using two training populations composed of breeding lines phenotyped in four years (2016-2018, and 2020) and a diversity panel phenotyped in four years (2013-2016). The prediction models used 19,861 genotyping-by-sequencing single-nucleotide polymorphism markers. Overall, square root transformed phenotypes using rrBLUP and support vector machine regression models displayed the highest combination of accuracy and relative efficiency across the regression and classification models. Further, a classification system based on support vector machine and ordinal Bayesian models with a 2-Class scale for SEV reached the highest class accuracy of 0.99. This study showed that breeders can use linear and non-parametric regression models within their own breeding lines over combined years to accurately predict skewed phenotypes.

Download Full-text

Local Support Vector Machine based on Cooperative Clustering for very large-scale dataset

2012 8th International Conference on Natural Computation ◽

10.1109/icnc.2012.6234598 ◽

2012 ◽

Cited By ~ 1

Author(s):

Chuanhuan Yin ◽

Yingying Zhu ◽

Shaomin Mu ◽

Shengfeng Tian

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Support Vector ◽

Large Scale Dataset ◽

Local Support ◽

Cooperative Clustering

Download Full-text

Least square Support Vector Machine for large-scale dataset

2015 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2015.7280575 ◽

2015 ◽

Author(s):

Khanh Nguyen ◽

Trung Le ◽

Vinh Lai ◽

Duy Nguyen ◽

Dat Tran ◽

...

Keyword(s):

Support Vector Machine ◽

Large Scale ◽

Least Square ◽

Support Vector ◽

Large Scale Dataset

Download Full-text

Scheduling Algorithms in Map Reduce

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc.v7i8.5342 ◽

2019 ◽

Vol 7 (8) ◽

pp. 01-06

Author(s):

Sahil Sangani

Keyword(s):

Large Scale ◽

Round Robin ◽

Mapreduce Framework ◽

Energy Aware ◽

Large Dataset ◽

Dynamic Priority ◽

Large Scale Dataset ◽

Fair Scheduler ◽

Resource Aware ◽

Self Adaptive

Data generated in the past few years cannot be efficiently manipulated with the traditional way of storing techniques as it is a large-scale dataset, and it can be structured, semi-structured, or unstructured. To deal with this kind of enormous dataset Hadoop framework is used, which supports the processing of large dataset in a distributed computing environment. Hadoop uses a technique named as MapReduce for processing and generating a large dataset with a parallel distributed algorithm on a cluster. It automatically handles failures and data loss due to its fault-tolerance property. The scheduler is a pluggable component of the MapReduce framework. Hadoop MapReduce framework uses various scheduler as per the requirements of the task. FIFO (First In First Out) is a default algorithm used by Hadoop, in which the jobs are executed in the order of their arrival. This paper will discuss myriad of schedulers such as FIFO, Capacity Scheduler, LATE Scheduler, Fair Scheduler, Delay Scheduler, Deadline Constraint Scheduler, and Resource Aware Scheduler. Besides these schedulers, we also conducted study of comparison of schedulers like Round Robin, Weighted Round Robin, Self-adaptive Reduce Scheduling (SARS), Self-adaptive MapReduce Scheduling (SAMR), Dynamic Priority Scheduling, Learning Scheduling, Classification & Optimization-based Scheduler (COSHH), Network-Aware, Match-matching, and Energy-Aware Scheduler. Hopefully, this study will enhance the understanding of the specific schedulers and stimulate other developers and consumers to make accurate decisions for their specific research interests.

Download Full-text

CNN-Based Target Recognition and Identification for Infrared Imaging in Defense Systems

Sensors ◽

10.3390/s19092040 ◽

2019 ◽

Vol 19 (9) ◽

pp. 2040 ◽

Cited By ~ 7

Author(s):

Antoine d’Acremont ◽

Ronan Fablet ◽

Alexandre Baussard ◽

Guillaume Quin

Keyword(s):

Large Scale ◽

Data Augmentation ◽

Infrared Imaging ◽

State Of The Art ◽

Object Identification ◽

Fine Tuning ◽

Support Vector ◽

Defense Systems ◽

Large Scale Dataset ◽

In The Wild

Convolutional neural networks (CNNs) have rapidly become the state-of-the-art models for image classification applications. They usually require large groundtruthed datasets for training. Here, we address object identification and recognition in the wild for infrared (IR) imaging in defense applications, where no such large-scale dataset is available. With a focus on robustness issues, especially viewpoint invariance, we introduce a compact and fully convolutional CNN architecture with global average pooling. We show that this model trained from realistic simulation datasets reaches a state-of-the-art performance compared with other CNNs with no data augmentation and fine-tuning steps. We also demonstrate a significant improvement in the robustness to viewpoint changes with respect to an operational support vector machine (SVM)-based scheme.

Download Full-text

A Container-Attachable Inertial Sensor for Real-Time Hydration Tracking

Sensors ◽

10.3390/s19184008 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4008 ◽

Cited By ~ 1

Author(s):

Henry Griffith ◽

Yan Shi ◽

Subir Biswas

Keyword(s):

Regression Models ◽

Large Scale ◽

Inertial Sensors ◽

Mutual Influence ◽

Inertial Sensor ◽

Measurement Unit ◽

Percentage Error ◽

Support Vector ◽

Accelerometer Sensor ◽

Fill Level

Various sensors have been proposed to address the negative health ramifications of inadequate fluid consumption. Amongst these solutions, motion-based sensors estimate fluid intake using the characteristics of drinking kinematics. This sensing approach is complicated due to the mutual influence of both the drink volume and the current fill level on the resulting motion pattern, along with differences in biomechanics across individuals. While motion-based strategies are a promising approach due to the proliferation of inertial sensors, previous studies have been characterized by limited accuracy and substantial variability in performance across subjects. This research seeks to address these limitations for a container-attachable triaxial accelerometer sensor. Drink volume is computed using support vector machine regression models with hand-engineered features describing the container’s estimated inclination. Results are presented for a large-scale data collection consisting of 1908 drinks consumed from a refillable bottle by 84 individuals. Per-drink mean absolute percentage error is reduced by 11.05% versus previous state-of-the-art results for a single wrist-wearable inertial measurement unit (IMU) sensor assessed using a similar experimental protocol. Estimates of aggregate consumption are also improved versus previously reported results for an attachable sensor architecture. An alternative tracking approach using the fill level from which a drink is consumed is also explored herein. Fill level regression models are shown to exhibit improved accuracy and reduced inter-subject variability versus volume estimators. A technique for segmenting the entire drink motion sequence into transport and sip phases is also assessed, along with a multi-target framework for addressing the known interdependence of volume and fill level on the resulting drink motion signature.

Download Full-text

Comparison of Support Vector and Non-Linear Regression Models for Estimating Large-Scale Vehicular Emissions, Incorporating Network-Wide Fundamental Diagram for Heterogeneous Vehicles

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120914304 ◽

2020 ◽

Vol 2674 (5) ◽

pp. 70-84

Author(s):

Ramin Saedi ◽

Rajat Verma ◽

Ali Zockaie ◽

Mehrnaz Ghamami ◽

Timothy J. Gates

Keyword(s):

Regression Models ◽

Large Scale ◽

Traffic Simulation ◽

Support Vector ◽

Vehicular Emissions ◽

Trajectory Data ◽

Fundamental Diagram ◽

Modeling Framework ◽

Non Linear ◽

Emission Models

Estimation of vehicular emissions at network level is a prominent issue in transportation planning and management of urban areas. For large networks, macroscopic emission models are preferred because of their simplicity. However, these models do not consider traffic flow dynamics that significantly affect emissions production. This study proposes a network-level emission modeling framework based on the network-wide fundamental diagram (NFD), via integrating the NFD properties with an existing microscopic emission model. The NFD and microscopic emission models are estimated using microscopic and mesoscopic traffic simulation tools at different scales for various traffic compositions. The major contribution is to consider heterogeneous vehicle types with different emission generation rates in a network-level model. This framework is applied to the large-scale network of Chicago as well as its central business district. Non-linear and support vector regression models are developed using simulated trajectory data of 13 simulated scenarios. The results show a satisfactory calibration and successful validation with acceptable deviations from the underlying microscopic emissions model regardless of the simulation tool that is used to calibrate the network-level emissions model. The microscopic traffic simulation is appropriate for smaller networks, while mesoscopic traffic simulation is a proper means to calibrate models for larger networks. The proposed model is also used to demonstrate the relationship between macroscopic emissions and flow characteristics in the form of a network emissions diagram. The results of this study provide a tool for planners to analyze vehicular emissions in real time and find optimal policies to control the level of emissions in large cities.

Download Full-text

Survey of Clustering Methods for Large Scale Dataset

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i5.13381344 ◽

2019 ◽

Vol 7 (5) ◽

pp. 1338-1344

Author(s):

Anupama Jawale ◽

Ganesh Magar

Keyword(s):

Large Scale ◽

Clustering Methods ◽

Large Scale Dataset

Download Full-text

Implantable neural interfaces

10.1093/oso/9780199674923.003.0050 ◽

2018 ◽

Author(s):

Stefano Vassanelli

Keyword(s):

Brain Function ◽

Large Scale ◽

Ionic Channels ◽

Patch Clamp Technique ◽

Advantages And Disadvantages ◽

Optogenetic Stimulation ◽

Physical Interfaces ◽

The Brain

Establishing direct communication with the brain through physical interfaces is a fundamental strategy to investigate brain function. Starting with the patch-clamp technique in the seventies, neuroscience has moved from detailed characterization of ionic channels to the analysis of single neurons and, more recently, microcircuits in brain neuronal networks. Development of new biohybrid probes with electrodes for recording and stimulating neurons in the living animal is a natural consequence of this trend. The recent introduction of optogenetic stimulation and advanced high-resolution large-scale electrical recording approaches demonstrates this need. Brain implants for real-time neurophysiology are also opening new avenues for neuroprosthetics to restore brain function after injury or in neurological disorders. This chapter provides an overview on existing and emergent neurophysiology technologies with particular focus on those intended to interface neuronal microcircuits in vivo. Chemical, electrical, and optogenetic-based interfaces are presented, with an analysis of advantages and disadvantages of the different technical approaches.

Download Full-text

Joint regression and learning from pairwise rankings for personalized image aesthetic assessment

Computational Visual Media ◽

10.1007/s41095-021-0207-y ◽

2021 ◽

Author(s):

Jin Zhou ◽

Qing Zhang ◽

Jian-Hao Fan ◽

Wei Sun ◽

Wei-Shi Zheng

Keyword(s):

Large Scale ◽

Assessment Model ◽

Generic Model ◽

Small Subset ◽

Deep Convolutional Neural Networks ◽

Personal Taste ◽

Hinge Loss ◽

Novel Approach ◽

Large Scale Dataset ◽

Image Pairs

AbstractRecent image aesthetic assessment methods have achieved remarkable progress due to the emergence of deep convolutional neural networks (CNNs). However, these methods focus primarily on predicting generally perceived preference of an image, making them usually have limited practicability, since each user may have completely different preferences for the same image. To address this problem, this paper presents a novel approach for predicting personalized image aesthetics that fit an individual user’s personal taste. We achieve this in a coarse to fine manner, by joint regression and learning from pairwise rankings. Specifically, we first collect a small subset of personal images from a user and invite him/her to rank the preference of some randomly sampled image pairs. We then search for the K-nearest neighbors of the personal images within a large-scale dataset labeled with average human aesthetic scores, and use these images as well as the associated scores to train a generic aesthetic assessment model by CNN-based regression. Next, we fine-tune the generic model to accommodate the personal preference by training over the rankings with a pairwise hinge loss. Experiments demonstrate that our method can effectively learn personalized image aesthetic preferences, clearly outperforming state-of-the-art methods. Moreover, we show that the learned personalized image aesthetic benefits a wide variety of applications.

Download Full-text