A Reliable Method for Identification of Antibiotics by Terahertz Spectroscopy and SVM

Journal of Spectroscopy ◽

10.1155/2020/8811467 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Jin Guo ◽

Hu Deng ◽

Quancheng Liu ◽

Linyu Chen ◽

Zhonggang Xiong ◽

...

Keyword(s):

Dimensionality Reduction ◽

Absorption Spectra ◽

Reliable Method ◽

Terahertz Spectroscopy ◽

Principal Component ◽

Support Vector ◽

Thz Spectroscopy ◽

Model Parameters ◽

Training Time ◽

Svm Model

Given the extensive use of antibiotics at present, the identification of antibiotics and production quality monitoring are of high importance. However, conventional antibiotic identification methods have a low sensitivity and a long detection time. Here, we propose an identification method that combines terahertz (THz) spectroscopy and chemometric technology. THz time-domain spectroscopy (THz-TDS) was performed for sixteen types of antibiotics, including β-lactam, cephalosporins, macrolides, and tetracyclines. The absorption spectra within the frequency range of 0.2–1.5 THz were calculated. For dimensionality reduction, principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) were implemented, respectively. The data after dimensionality reduction were input into a support vector machine (SVM). The model parameters were optimized through grid search (GS), genetic algorithm (GA), and particle swarm optimization (PSO) methods, and the optimal identification results were obtained after comparison across these methods. Experiments indicate a differentiation of the THz absorption spectra among the sixteen types of antibiotics. After dimensionality reduction, the training time of the model significantly decreased. The use of the t-SNE-PSO-SVM model achieved the highest average accuracy on the prediction set, which was 99.91%. Thus, our study does not only confirm that the t-SNE-PSO-SVM model proves to be a reliable method for antibiotics identification, but also confirms that the combination of THz-TDS and chemometric pattern recognition has great potential for drug detection.

Download Full-text

Determination of the Geographical Origin of Coffee Beans Using Terahertz Spectroscopy Combined With Machine Learning Methods

Frontiers in Nutrition ◽

10.3389/fnut.2021.680627 ◽

2021 ◽

Vol 8 ◽

Author(s):

Si Yang ◽

Chenxi Li ◽

Yang Mei ◽

Wen Liu ◽

Rong Liu ◽

...

Keyword(s):

Machine Learning ◽

Terahertz Spectroscopy ◽

Geographic Origin ◽

Principal Component ◽

Support Vector ◽

Thz Spectroscopy ◽

Learning Methods ◽

Linear Discriminant ◽

Machine Learning Methods ◽

Coffee Beans

Different geographical origins can lead to great variance in coffee quality, taste, and commercial value. Hence, controlling the authenticity of the origin of coffee beans is of great importance for producers and consumers worldwide. In this study, terahertz (THz) spectroscopy, combined with machine learning methods, was investigated as a fast and non-destructive method to classify the geographic origin of coffee beans, comparing it with the popular machine learning methods, including convolutional neural network (CNN), linear discriminant analysis (LDA), and support vector machine (SVM) to obtain the best model. The curse of dimensionality will cause some classification methods which are struggling to train effective models. Thus, principal component analysis (PCA) and genetic algorithm (GA) were applied for LDA and SVM to create a smaller set of features. The first nine principal components (PCs) with an accumulative contribution rate of 99.9% extracted by PCA and 21 variables selected by GA were the inputs of LDA and SVM models. The results demonstrate that the excellent classification (accuracy was 90% in a prediction set) could be achieved using a CNN method. The results also indicate variable selecting as an important step to create an accurate and robust discrimination model. The performances of LDA and SVM algorithms could be improved with spectral features extracted by PCA and GA. The GA-SVM has achieved 75% accuracy in a prediction set, while the SVM and PCA-SVM have achieved 50 and 65% accuracy, respectively. These results demonstrate that THz spectroscopy, together with machine learning methods, is an effective and satisfactory approach for classifying geographical origins of coffee beans, suggesting the techniques to tap the potential application of deep learning in the authenticity of agricultural products while expanding the application of THz spectroscopy.

Download Full-text

Temperature dependent poly(L-lactide) crystallization investigated by Fourier transform terahertz spectroscopy

Materials Advances ◽

10.1039/d1ma00195g ◽

2021 ◽

Author(s):

Seiichiro Ariyoshi ◽

Satoshi Ohnishi ◽

Hikaru Mikami ◽

Hideto Tsuji ◽

Yuki Arakawa ◽

...

Keyword(s):

Fourier Transform ◽

Absorption Spectra ◽

Terahertz Spectroscopy ◽

Thz Spectroscopy ◽

Frequency Range ◽

Temperature Dependent ◽

Thz Absorption

Poly(L-lactide) (PLLA) was investigated by Fourier transform terahertz (THz) spectroscopy over the frequency range of 1.0 – 8.5 THz. THz absorption spectra were acquired for PLLA samples isothermally crystallized at...

Download Full-text

Cancer Discrimination Using Fourier Transform Near-Infrared Spectroscopy with Chemometric Models

Journal of Chemistry ◽

10.1155/2015/619685 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Hui Chen ◽

Zan Lin ◽

Chao Tan

Keyword(s):

Cancer Diagnosis ◽

Near Infrared ◽

Nir Spectroscopy ◽

Principal Component ◽

Classification Model ◽

Support Vector ◽

Biomedical Analysis ◽

First Derivative ◽

Svm Model ◽

High Prediction

Near-infrared (NIR) spectroscopy technique offers many potential advantages as tool for biomedical analysis since it enables the subtle biochemical signatures related to pathology to be detected and extracted. In conjunction with advanced chemometrics, NIR spectroscopy opens the possibility of their use in cancer diagnosis. The study focuses on the application of near-infrared (NIR) spectroscopy and classification models for discriminating colorectal cancer. A total of 107 surgical specimens and a corresponding NIR diffuse reflection spectral dataset were prepared. Three preprocessing methods were attempted and least-squares support vector machine (LS-SVM) was used to build a classification model. The hybrid preprocessing of first derivative and principal component analysis (PCA) resulted in the best LS-SVM model with the sensitivity and specificity of 0.96 and 0.96 for the training and 0.94 and 0.96 for test sets, respectively. The similarity performance on both subsets indicated that overfitting did not occur, assuring the robustness and reliability of the developed LS-SVM model. The area of receiver operating characteristic (ROC) curve was 0.99, demonstrating once again the high prediction power of the model. The result confirms the applicability of the combination of NIR spectroscopy, LS-SVM, PCA, and first derivative preprocessing for cancer diagnosis.

Download Full-text

Cascade Support Vector Machines with Dimensionality Reduction

Applied Computational Intelligence and Soft Computing ◽

10.1155/2015/216132 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Oliver Kramer

Keyword(s):

Support Vector Machines ◽

Dimensionality Reduction ◽

Principal Component ◽

Large Data ◽

Locally Linear Embedding ◽

Benchmark Problems ◽

Support Vector ◽

Training Set ◽

Support Vectors ◽

Vector Machines

Cascade support vector machines have been introduced as extension of classic support vector machines that allow a fast training on large data sets. In this work, we combine cascade support vector machines with dimensionality reduction based preprocessing. The cascade principle allows fast learning based on the division of the training set into subsets and the union of cascade learning results based on support vectors in each cascade level. The combination with dimensionality reduction as preprocessing results in a significant speedup, often without loss of classifier accuracies, while considering the high-dimensional pendants of the low-dimensional support vectors in each new cascade level. We analyze and compare various instantiations of dimensionality reduction preprocessing and cascade SVMs with principal component analysis, locally linear embedding, and isometric mapping. The experimental analysis on various artificial and real-world benchmark problems includes various cascade specific parameters like intermediate training set sizes and dimensionalities.

Download Full-text

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Factor Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Download Full-text

Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2018.v09.i03.p08 ◽

2018 ◽

pp. 192 ◽

Cited By ~ 2

Author(s):

Ade Jamal ◽

Annisa Handayani ◽

Ali Akbar Septiandri ◽

Endang Ripmiatin ◽

Yunus Effendi

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Prediction ◽

Extreme Gradient Boosting

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.

Download Full-text

SVM MODELS FOR DIAGNOSING BALANCE PROBLEMS USING STATISTICAL FEATURES OF THE MTC SIGNAL

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002314 ◽

2008 ◽

Vol 07 (03) ◽

pp. 317-331 ◽

Cited By ~ 5

Author(s):

DANIEL T. H. LAI ◽

REZAUL BEGG ◽

MARIMUTHU PALANISWAMI

Keyword(s):

Feature Selection Method ◽

Gaussian Kernel ◽

Polynomial Kernel ◽

Support Vector ◽

Model Parameters ◽

Statistical Features ◽

Gait Patterns ◽

Healthy Elderly ◽

Svm Model ◽

Balance Problems

Trip-related falls are a major problem in the elderly population and research in the area has received much attention recently. The focus has been on devising ways of identifying individuals at risk of sustaining such falls. The main aim of this work is to explore the effectiveness of models based on Support Vector Machines (SVMs) for the automated recognition of gait patterns that exhibit falling behavior. Minimum toe clearance (MTC) during continuous walking on a treadmill was recorded on 10 healthy elderly and 10 elderly with balance problems and with a history of tripping falls. Statistical features obtained from MTC histograms were used as inputs to the SVM model to classify between the healthy and balance-impaired subjects. The leave-one-out technique was utilized for training the SVM model in order to find the optimal model parameters. Tests were conducted with various kernels (linear, Gaussian and polynomial) and with a change in the regularization parameter, C, in an effort to identify the optimum model for this gait data. The receiver operating characteristic (ROC) plots of sensitivity and specificity were further used to evaluate the diagnostic performance of the model. The maximum accuracy was found to be 90% using a Gaussian kernel with σ2 = 10 and the maximum ROC area 0.98 (80% sensitivity and 100% specificity), when all statistical features were used by the SVM models to diagnose gait patterns of healthy and balance-impaired individuals. This accuracy was further improved by using a feature selection method in order to reduce the effect of redundant features. It was found that two features (standard deviation and maximum value) were adequate to give an improved accuracy of 95% (90% sensitivity and 100% specificity) using a polynomial kernel of degree 2. These preliminary results are encouraging and could be useful not only for diagnostic applications but also for evaluating improvements in gait function in the clinical/rehabilitation contexts.

Download Full-text

A Forecast Model of the Number of Containers for Containership Voyage

Algorithms ◽

10.3390/a11120193 ◽

2018 ◽

Vol 11 (12) ◽

pp. 193

Author(s):

Yuchuang Wang ◽

Guoyou Shi ◽

Xiaotong Sun

Keyword(s):

Container Terminal ◽

Forecast Model ◽

Gray Relational Analysis ◽

Support Vector ◽

Model Parameters ◽

Container Ship ◽

Kernel Support Vector Machine ◽

Proposed Model ◽

Svm Model ◽

Pass Through

Container ships must pass through multiple ports of call during a voyage. Therefore, forecasting container volume information at the port of origin followed by sending such information to subsequent ports is crucial for container terminal management and container stowage personnel. Numerous factors influence container allocation to container ships for a voyage, and the degree of influence varies, engendering a complex nonlinearity. Therefore, this paper proposes a model based on gray relational analysis (GRA) and mixed kernel support vector machine (SVM) for predicting container allocation to a container ship for a voyage. First, in this model, the weights of influencing factors are determined through GRA. Then, the weighted factors serve as the input of the SVM model, and SVM model parameters are optimized through a genetic algorithm. Numerical simulations revealed that the proposed model could effectively predict the number of containers for container ship voyage and that it exhibited strong generalization ability and high accuracy. Accordingly, this model provides a new method for predicting container volume for a voyage.

Download Full-text

Study on the Quantitative Method of Oversaturated Intersection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.587-589.2100 ◽

2014 ◽

Vol 587-589 ◽

pp. 2100-2104

Author(s):

Qin Liu ◽

Jian Min Xu ◽

Kai Lu

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Travel Speed ◽

Urban Traffic ◽

Support Vector ◽

Model Parameters ◽

Guangzhou City ◽

Traffic System ◽

Traffic Conditions ◽

Svm Model

Oversaturation in the modern urban traffic often happens. In order to describe the degree of oversaturation, the indexes of intersection oversaturation degree are put forward include dissipation time, stranded queue, overflow queue and travel speed. On the basis of selected indexes, the genetic algorithm support vector machine (GA-SVM) model was proposed to quantify the degree of oversaturation. In this method the genetic algorithm is used to select the model parameters. The GA-SVM model built is used to quantify the degree of oversaturation. Combining with the volume of intersections in Guangzhou city the method is calculated and simulated through programming. The simulation results show that GA-SVM method is effective and the accuracy of GA-SVM is higher than support vector machine (SVM).This method provides a theoretical basis for the analysis of traffic system under over-saturated traffic conditions.

Download Full-text

Predicting Freeway Travel Time Using Multiple- Source Heterogeneous Data Integration

Applied Sciences ◽

10.3390/app9010104 ◽

2018 ◽

Vol 9 (1) ◽

pp. 104 ◽

Cited By ~ 1

Author(s):

Kejun Long ◽

Wukai Yao ◽

Jian Gu ◽

Wei Wu ◽

Lee Han

Keyword(s):

Neural Network ◽

Travel Time ◽

Bp Neural Network ◽

Historical Data ◽

Multiple Source ◽

Function Parameter ◽

Support Vector ◽

Model Parameters ◽

Svm Model ◽

Adverse Weather

Freeway travel time is influenced by many factors including traffic volume, adverse weather, accidents, traffic control, and so on. We employ the multiple source data-mining method to analyze freeway travel time. We collected toll data, weather data, traffic accident disposal logs, and other historical data from Freeway G5513 in Hunan Province, China. Using the Support Vector Machine (SVM), we proposed the travel time predicting model founded on these databases. The new SVM model can simulate the nonlinear relationship between travel time and those factors. In order to improve the precision of the SVM model, we applied the Artificial Fish Swarm algorithm to optimize the SVM model parameters, which include the kernel parameter σ, non-sensitive loss function parameter ε, and penalty parameter C. We compared the new optimized SVM model with the Back Propagation (BP) neural network and a common SVM model, using the historical data collected from freeway G5513. The results show that the accuracy of the optimized SVM model is 17.27% and 16.44% higher than those of the BP neural network model and the common SVM model, respectively.

Download Full-text