WE-ASCA: The Weighted-Effect ASCA for Analyzing Unbalanced Multifactorial Designs—A Raman Spectra-Based Example

Nairveen Ali; Jeroen Jansen; André van den Doel; Gerjen Herman Tinnevelt; Thomas Bocklitz

doi:10.3390/molecules26010066

WE-ASCA: The Weighted-Effect ASCA for Analyzing Unbalanced Multifactorial Designs—A Raman Spectra-Based Example

Molecules ◽

10.3390/molecules26010066 ◽

2020 ◽

Vol 26 (1) ◽

pp. 66

Author(s):

Nairveen Ali ◽

Jeroen Jansen ◽

André van den Doel ◽

Gerjen Herman Tinnevelt ◽

Thomas Bocklitz

Keyword(s):

Linear Models ◽

Multivariate Data ◽

Classification Performance ◽

Design Matrix ◽

Data Set ◽

Unbalanced Designs ◽

Simultaneous Component Analysis ◽

Weighted Effect ◽

Raman Spectral Data ◽

Raman Spectral

Analyses of multifactorial experimental designs are used as an explorative technique describing hypothesized multifactorial effects based on their variation. The procedure of analyzing multifactorial designs is well established for univariate data, and it is known as analysis of variance (ANOVA) tests, whereas only a few methods have been developed for multivariate data. In this work, we present the weighted-effect ASCA, named WE-ASCA, as an enhanced version of ANOVA-simultaneous component analysis (ASCA) to deal with multivariate data in unbalanced multifactorial designs. The core of our work is to use general linear models (GLMs) in decomposing the response matrix into a design matrix and a parameter matrix, while the main improvement in WE-ASCA is to implement the weighted-effect (WE) coding in the design matrix. This WE-coding introduces a unique solution to solve GLMs and satisfies a constrain in which the sum of all level effects of a categorical variable equal to zero. To assess the WE-ASCA performance, two applications were demonstrated using a biomedical Raman spectral data set consisting of mice colorectal tissue. The results revealed that WE-ASCA is ideally suitable for analyzing unbalanced designs. Furthermore, if WE-ASCA is applied as a preprocessing tool, the classification performance and its reproducibility can significantly improve.

Download Full-text

An interpretation of raman spectral data for polymer electrolytes in the light of new evidence for ion association in dilute solution

Journal of Polymer Science Part B Polymer Physics ◽

10.1002/polb.1991.090291113 ◽

1991 ◽

Vol 29 (11) ◽

pp. 1441-1445 ◽

Cited By ~ 16

Author(s):

Fiona M. Gray

Keyword(s):

Spectral Data ◽

Polymer Electrolytes ◽

Ion Association ◽

Dilute Solution ◽

Raman Spectral Data ◽

New Evidence ◽

Raman Spectral

Download Full-text

An empirical wavelet transform based approach for multivariate data processing application to cardiovascular physiological signals

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0030 ◽

2018 ◽

Vol 14 (4) ◽

Cited By ~ 1

Author(s):

Omkar Singh ◽

Ramesh Kumar Sunkaria

Keyword(s):

Wavelet Transform ◽

Multivariate Data ◽

Heterogeneous Data ◽

Physiological Signals ◽

Data Series ◽

Data Set ◽

Processing Application ◽

Adaptive Wavelet ◽

Empirical Wavelet Transform ◽

Multivariate Signals

Abstract Background This article proposes an extension of empirical wavelet transform (EWT) algorithm for multivariate signals specifically applied to cardiovascular physiological signals. Materials and methods EWT is a newly proposed algorithm for extracting the modes in a signal and is based on the design of an adaptive wavelet filter bank. The proposed algorithm finds an optimum signal in the multivariate data set based on mode estimation strategy and then its corresponding spectra is segmented and utilized for extracting the modes across all the channels of the data set. Results The proposed algorithm is able to find the common oscillatory modes within the multivariate data and can be applied for multichannel heterogeneous data analysis having unequal number of samples in different channels. The proposed algorithm was tested on different synthetic multivariate data and a real physiological trivariate data series of electrocardiogram, respiration, and blood pressure to justify its validation. Conclusions In this article, the EWT is extended for multivariate signals and it was demonstrated that the component-wise processing of multivariate data leads to the alignment of common oscillating modes across the components.

Download Full-text

The effect of statin treatment on survival and on the use of healthcare resources among patients with acute myocardial infarction

Nordic Journal of Health Economics ◽

10.5617/njhe.4538 ◽

2018 ◽

pp. 30-48

Author(s):

Lien Nguyen ◽

Unto Häkkinen ◽

Henna Jurvanen

Keyword(s):

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Linear Models ◽

Statin Treatment ◽

Data Set ◽

Hospitalised Patients ◽

Life Years ◽

Matched Data ◽

The Cost ◽

Statin Use

The aim of this study was to investigate the cost-effectiveness of statin use by newly hospitalised patients with acute myocardial infarction (AMI) in Finland. The data were from the PERFECT database of patients hospitalised for AMI and discharged in 1998–2012 in Finland. Selected patients had first-time AMI and had not used statins earlier (N=60 404). We generated a matched data set from statin non-users for statin users based on propensity matching analysis (N=28 412), which was also used. Statin use was defined as statins purchased within the first week after hospital discharge. Healthcare costs included costs of inpatient and outpatient hospital care, costs of nursing homes and costs of prescribed medicines (at 2011 prices). The follow-up time was one year. Logit and generalised linear models were used. We measured the effects of statin use as life years (LYs) gained and computed costs per LY gained. Both data were analysed for the entire period and for subperiods 1998–2001, 2002–2007 and 2008–2011, without discount rates and with a 3% discount rate. An average patient would gain 0.26–0.51 more years. The estimated costs per LY gained ranged between EUR 800 and 15 000. They were highest (EUR 12 000–15 000) in 1998–2001 by the matched data, but were actually savings in 2008–2011. The estimated costs indicate that statin use in treating AMI was very cost-effective. However, our rather long study period may suggest that the cost estimates per LY gained could be overestimated, as the life expectancy of AMI patients is likely shorter than that of the general population.Published: Online April 2018.

Download Full-text

Ordinal classification for efficient plant stress prediction in hyperspectral data

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xl-7-29-2014 ◽

2014 ◽

Vol XL-7 ◽

pp. 29-36 ◽

Cited By ~ 5

Author(s):

J. Behmann ◽

P. Schmitter ◽

J. Steinrücken ◽

L. Plümer

Keyword(s):

Linear Models ◽

Plant Stress ◽

Crop Protection ◽

Local Stress ◽

Prediction Performance ◽

Hyperspectral Data ◽

Hyperspectral Images ◽

Support Vector ◽

Data Set ◽

High Prediction

Detection of crop stress from hyperspectral images is of high importance for breeding and precision crop protection. However, the continuous monitoring of stress in phenotyping facilities by hyperspectral imagers produces huge amounts of uninterpreted data. In order to derive a stress description from the images, interpreting algorithms with high prediction performance are required. Based on a static model, the local stress state of each pixel has to be predicted. Due to the low computational complexity, linear models are preferable. <br><br> In this paper, we focus on drought-induced stress which is represented by discrete stages of ordinal order. We present and compare five methods which are able to derive stress levels from hyperspectral images: One-vs.-one Support Vector Machine (SVM), one-vs.-all SVM, Support Vector Regression (SVR), Support Vector Ordinal Regression (SVORIM) and Linear Ordinal SVM classification. The methods are applied on two data sets - a real world set of drought stress in single barley plants and a simulated data set. It is shown, that Linear Ordinal SVM is a powerful tool for applications which require high prediction performance under limited resources. It is significantly more efficient than the one-vs.-one SVM and even more efficient than the less accurate one-vs.-all SVM. Compared to the very compact SVORIM model, it represents the senescence process much more accurate.

Download Full-text

Logistic Regression Ensemble (LORENS) Applied to Drug Discovery

MATEMATIKA ◽

10.11113/matematika.v36.n1.1197 ◽

2020 ◽

Vol 36 (1) ◽

pp. 43-49

Author(s):

T Dwi Ary Widhianingsih ◽

Heri Kuswanto ◽

Dedy Dwi Prastyo

Keyword(s):

Logistic Regression ◽

Drug Discovery ◽

Objective Function ◽

Classification Performance ◽

High Dimensionality ◽

High Dimensional ◽

Classification Methods ◽

Data Set ◽

Computational Burden ◽

Cancerous Cells

Logistic regression is one of the commonly used classification methods. It has some advantages, specifically related to hypothesis testing and its objective function. However, it also has some disadvantages in the case of high-dimensional data, such as multicolinearity, over-fitting, and a high computational burden. Ensemblebased classification methods have been proposed to overcome these problems. The logistic regression ensemble (LORENS) method is expected to improve the classification performance of basic logistic regression. In this paper, we apply it to the case of drug discovery with the objective of obtaining candidate compounds to protect the normal non-cancerous cells, which is considered to be a problem with a data-set of high dimensionality. The experimental results show that it performs well, with an accuracy of 69% and AUC of 0.7306.

Download Full-text

ChemTok: A New Rule Based Tokenizer for Chemical Named Entity Recognition

BioMed Research International ◽

10.1155/2016/4248026 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 5

Author(s):

Abbas Akkasi ◽

Ekrem Varoğlu ◽

Nazife Dimililer

Keyword(s):

Conditional Random Fields ◽

Named Entity Recognition ◽

Classification Performance ◽

Entity Recognition ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

Rule Based ◽

Named Entity ◽

Vector Machines

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.

Download Full-text

Board diversity and performance in a masculine, aged and glocal supply chain: new empirical evidence

Corporate Governance ◽

10.1108/cg-09-2020-0417 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Giuseppe Giulio Calabrese ◽

Alessandro Manello

Keyword(s):

Supply Chain ◽

Firm Performance ◽

Empirical Evidence ◽

Linear Models ◽

Data Set ◽

Board Diversity ◽

Content Type ◽

Automotive Supply Chain ◽

And Performance ◽

Automotive Supply

Purpose This study aims to contribute to the debate on the relationship between board diversity and performance, a hot topic for scholars and shareholders. A number of studies have found contrasting impacts of board diversity on firm performance and this paper adds new and original evidence in the context of the automotive supply chain focusing on gender, age and nationality diversity. Design/methodology/approach The authors propose a triple stage empirical analysis. First, the authors use linear models according to different performance indexes for investigating diversity (gender, age and nationality) within the board of directors and executives. Second, the authors investigate the issue of diversity in different contexts such as position in the supply chain, nationality of the owner and family/corporate ownership. Finally, the authors use non-linear models to find a better combination of diversity in terms of gender and nationality for retrieving some managerial implications. Findings First, the authors demonstrate a robust positive effect of women in board representation on firm performance in terms of profitability and firm risk. In the case of, age and nationality the results are more equivocal in particular for the former. Second, the authors depict board diversity in different contexts as follows: positioning in the supply chain, type and nationality of the final owner. Again, gender heterogeneity is more adequate in the complex firm as Tier 1 suppliers, corporate and foreign company. Originality/value The authors focused the analysis on a specific industry, shedding light on the main specificities linked to operating in certain phases of the supply chain, a substantial novelty in this field. The empirical evidence is based on a very large data set containing quantitative and qualitative information on a representative sample of 1,538 firms operating in the Italian automotive supply chain, one of the most relevant in Europe.

Download Full-text

A systematical approach to classification problems with feature space heterogeneity

Kybernetes ◽

10.1108/k-06-2018-0313 ◽

2019 ◽

Vol 48 (9) ◽

pp. 2006-2029

Author(s):

Hongshan Xiao ◽

Yu Wang

Keyword(s):

Factor Analysis ◽

Meta Analysis ◽

Feature Space ◽

Classification Performance ◽

Classification Algorithm ◽

Significant Feature ◽

Data Sets ◽

Data Set ◽

Classification Techniques ◽

Content Type

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.

Download Full-text

A root-mean-square-error analysis of two-peak Gaussian and Lorentzian fittings of thin-film carbon Raman spectral data

Journal of Applied Physics ◽

10.1063/1.5089139 ◽

2019 ◽

Vol 126 (4) ◽

pp. 045706 ◽

Cited By ~ 1

Author(s):

Jonathan Laumer ◽

Stephen K. O’Leary

Keyword(s):

Thin Film ◽

Root Mean Square Error ◽

Error Analysis ◽

Spectral Data ◽

Mean Square Error ◽

Root Mean Square ◽

Mean Square ◽

Raman Spectral Data ◽

Raman Spectral

Download Full-text

A Deep Convolutional Neural Network for Oil Spill Detection from Spaceborne SAR Images

Remote Sensing ◽

10.3390/rs12061015 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1015 ◽

Cited By ~ 3

Author(s):

Kan Zeng ◽

Yixiao Wang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Oil Spill ◽

Data Augmentation ◽

Classification Performance ◽

Deep Convolutional Neural Network ◽

Convolutional Network ◽

Data Set ◽

Oil Spill Detection ◽

Processing Framework

Classification algorithms for automatically detecting sea surface oil spills from spaceborne Synthetic Aperture Radars (SARs) can usually be regarded as part of a three-step processing framework, which briefly includes image segmentation, feature extraction, and target classification. A Deep Convolutional Neural Network (DCNN), named the Oil Spill Convolutional Network (OSCNet), is proposed in this paper for SAR oil spill detection, which can do the latter two steps of the three-step processing framework. Based on VGG-16, the OSCNet is obtained by designing the architecture and adjusting hyperparameters with the data set of SAR dark patches. With the help of the big data set containing more than 20,000 SAR dark patches and data augmentation, the OSCNet can have as many as 12 weight layers. It is a relatively deep Deep Learning (DL) network for SAR oil spill detection. It is shown by the experiments based on the same data set that the classification performance of OSCNet has been significantly improved compared to that of traditional machine learning (ML). The accuracy, recall, and precision are improved from 92.50%, 81.40%, and 80.95% to 94.01%, 83.51%, and 85.70%, respectively. An important reason for this improvement is that the distinguishability of the features learned by OSCNet itself from the data set is significantly higher than that of the hand-crafted features needed by traditional ML algorithms. In addition, experiments show that data augmentation plays an important role in avoiding over-fitting and hence improves the classification performance. OSCNet has also been compared with other DL classifiers for SAR oil spill detection. Due to the huge differences in the data sets, only their similarities and differences are discussed at the principle level.

Download Full-text