Development and Application of a Data-Driven Reaction Classification Model: Comparison of an Electronic Lab Notebook and Medicinal Chemistry Literature

This paper provides a data-driven model of the vibration response of a railway crossing during vehicle passages. Many of the features of trains passing through instrumented crossing are extracted from measured data. Based on the feature selection process, speed, dynamic axle load and the number of wagons are found proper inputs in the prediction model. Train-crossing interaction response at a crossing due to passing trains is modeled from a data-driven Neuro-Fuzzy soft computing approach. Locally Linear Model Tree (LOLIMOT) is applied to predict the crossing nose acceleration. The model comparison against measurements shows that the ability to predict the extrapolation cases at off-range speeds has satisfactory compatibility. The monitored passing trains are ranked based on the LOLIMOT input space dimension cuts and extrapolation of the model up to higher train speeds. The influence of train factors (i.e. speed, dynamic axle load, number of wagons) on crossing response is demonstrated. Also, based on the analysis results, it is concluded that with a steady increase in train speeds, some trains show a greater amplification in vibration response than others. The results can be applied in data processing in the crossing vibration monitoring and detection of trains with crossing impact sensitive to speed increasing that can lead to proper operation policies to reduce damages and maintenance costs.

Download Full-text

Data-driven medicinal chemistry in the era of big data

Drug Discovery Today ◽

10.1016/j.drudis.2013.12.004 ◽

2014 ◽

Vol 19 (7) ◽

pp. 859-868 ◽

Cited By ~ 78

Author(s):

Scott J. Lusher ◽

Ross McGuire ◽

René C. van Schaik ◽

C. David Nicholson ◽

Jacob de Vlieg

Keyword(s):

Big Data ◽

Medicinal Chemistry ◽

Data Driven

Download Full-text

Adaptive data-driven selection of sequences of biological and cognitive markers in clinical diagnosis of dementia

10.1101/2021.10.26.21265515 ◽

2021 ◽

Author(s):

Patric Wyss ◽

David Ginsbourger ◽

Haochang Shou ◽

Christos Davatzikos ◽

Stefan Klöppel ◽

...

Keyword(s):

Analytical Framework ◽

Classification Model ◽

Data Driven ◽

Alternative Methods ◽

Sequential Algorithm ◽

Time Interval ◽

Sequential Decision ◽

Diagnosis Of Dementia ◽

Cost Parameters ◽

Selection Of

Combining the right--potentially invasive and expensive, markers at the appropriate time is critical to obtain reliable yet economically sustainable decisions in the preclinical diagnosis of dementia. We propose a data-driven analytical framework to individualize the selection of prognostic biomarkers that balance accuracy, costs of opportunity due to delaying the decision, and cost of acquisition depending to prescribed cost parameters. We compared sequential and non-sequential decision strategies based on a linear mixed-effects classification model that integrates irregular, multi-variate longitudinal data. The framework was applied to separate participants that progress to Alzheimer's disease from the ones that do not within a time interval of three years. As expected, the highest accuracy was obtained by combining all available data from 20.9 measurements per subject on average that were acquired over 4.8 years on average. The proposed sequential algorithm empirically outperformed alternative methods by having lowest costs for a range of tested cost parameters. With the default cost parameters, the sequential algorithm reached an accuracy of 0.84, specificity of 0.86, and sensitivity of 0.82 (0.89, 0.91, and 0.88 with all available data, respectively) while requiring only 2.9 measurements on average (86 percent less observations than all available data) and a time interval of half a year on average (89 percent shorter than all time points). Our sequential algorithms established the decision based on individualized sequences of measurements with reduced process costs compared to non-sequential classification strategies while maintaining competitive accuracy.

Download Full-text

Estimating forest carbon fluxes using four different data-driven techniques based on long-term eddy covariance measurements: Model comparison and evaluation

The Science of The Total Environment ◽

10.1016/j.scitotenv.2018.01.202 ◽

2018 ◽

Vol 627 ◽

pp. 78-94 ◽

Cited By ~ 15

Author(s):

Xianming Dou ◽

Yongguo Yang

Keyword(s):

Eddy Covariance ◽

Model Comparison ◽

Carbon Fluxes ◽

Forest Carbon ◽

Data Driven ◽

Eddy Covariance Measurements

Download Full-text

A Deep Autoencoder-Based Convolution Neural Network Framework for Bearing Fault Classification in Induction Motors

Sensors ◽

10.3390/s21248453 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8453

Author(s):

Rafia Nishat Toma ◽

Farzin Piltan ◽

Jong-Myon Kim

Keyword(s):

Neural Network ◽

Fault Diagnosis ◽

Classification Accuracy ◽

Classification Model ◽

Data Driven ◽

Fault Classification ◽

Current Signal ◽

Bearing Fault ◽

Motor Current ◽

Diagnosis And Classification

Fault diagnosis and classification for machines are integral to condition monitoring in the industrial sector. However, in recent times, as sensor technology and artificial intelligence have developed, data-driven fault diagnosis and classification have been more widely investigated. The data-driven approach requires good-quality features to attain good fault classification accuracy, yet domain expertise and a fair amount of labeled data are important for better features. This paper proposes a deep auto-encoder (DAE) and convolutional neural network (CNN)-based bearing fault classification model using motor current signals of an induction motor (IM). Motor current signals can be easily and non-invasively collected from the motor. However, the current signal collected from industrial sources is highly contaminated with noise; feature calculation thus becomes very challenging. The DAE is utilized for estimating the nonlinear function of the system with the normal state data, and later, the residual signal is obtained. The subsequent CNN model then successfully classified the types of faults from the residual signals. Our proposed semi-supervised approach achieved very high classification accuracy (more than 99%). The inclusion of DAE was found to not only improve the accuracy significantly but also to be potentially useful when the amount of labeled data is small. The experimental outcomes are compared with some existing works on the same dataset, and the performance of this proposed combined approach is found to be comparable with them. In terms of the classification accuracy and other evaluation parameters, the overall method can be considered as an effective approach for bearing fault classification using the motor current signal.

Download Full-text

Data Driven Transformation of a Classification Model into Ranking

10.23919/icac50006.2021.9594062 ◽

2021 ◽

Author(s):

Salem Chakhar ◽

Yu-Ling Lin ◽

Rui Yang

Keyword(s):

Classification Model ◽

Data Driven

Download Full-text

Automatic Identification of Rock Formation Type While Drilling Using Machine Learning Based Data-Driven Models

10.2118/201020-ms ◽

2021 ◽

Author(s):

Enrique Z. Losoya ◽

Narendra Vishnumolakala ◽

Samuel F. Noynaert ◽

Zenon Medina-Cetina ◽

Satish Bukkapatnam ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Prediction Accuracy ◽

Classification Model ◽

Data Driven ◽

Classification Algorithms ◽

Rock Formation ◽

Mechanical Specific Energy ◽

Formation Type

Abstract The objective of this study is to present a novel rock formation identification model using a data-driven modeling approach. This study explores the use of real-time drilling data to train and validate a classification model to improve the efficiency of the drilling process by reducing Mechanical Specific Energy (MSE). In this study, we demonstrate the feasibility of a layer-based determination and change detection of properties of rock formation currently being drilled as accurately and fast as possible. Data for this study was collected from a custom-built lab-scale drilling rig equipped with multiple sensors. The experiment was conducted by drilling through an arrangement of different rock formations of varying rock strength properties. Data was recorded and stored at a frequency of 2 kHz, then filtered, processed, and downsampled to extract relevant features. This dataset was used to train an Artificial Neural Network and other machine learning classification algorithms. Feature selection was made first with ten most notable features found by Random Forest, and the second set with derived measurements and down-sampled dynamic features from the sensors. The classification analysis was divided into two steps: the best predictors/features extraction and classification model building. The models were trained using multiple classification algorithms, namely logistic regression, linear discriminant analysis (LDA), Support Vector Machines (SVM), Random Forest (RF), and Artificial Neural Networks (ANN). It was found that random forest and ANN performed the best with prediction accuracy of 99.48% and 99.58%, respectively, for the data set with ten most prominent features. The high prediction rate accuracy for the most prominent predictors suggests that if the high-frequency data can be processed in real-time, predicting what formation we are drilling in is possible to achieve in near real-time. This can lead to significant savings for drilling companies as optimal drilling parameters can be computed, and in turn, optimized Mechanical Specific Energy can be obtained in real-time. Since the rock formation identification is time-consuming, we also describe here an alternative approach using slightly less accurate but equally powerful dynamic predictors. In this case, we show that our dynamic predictor models with RF and ANN yielded prediction accuracy of 96.30% and 95.61%, respectively. Both the prominent feature and dynamic predictor approaches are described in detail in this paper. Our results suggest that accurately predicting rock formation type in real-time while drilling is very much feasible with lesser computational cost and complexity. This study provides the building blocks for the development of a completely autonomous downhole device and Electronic Device Recorders (EDR) that reduces the need for highly sophisticated sensors or data transmission processes downhole.

Download Full-text

Classifying Business Types on Twitter Based on User Influential Analysis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.3719 ◽

2011 ◽

Vol 403-408 ◽

pp. 3719-3723

Author(s):

Chanattha Thongsuk ◽

Choochart Haruechaiyasak ◽

Somkid Saelee

Keyword(s):

Recommender System ◽

Model Comparison ◽

Selection Method ◽

Classification Model ◽

User Selection ◽

User Group ◽

User Groups ◽

Influential Users ◽

Selection Parameters ◽

F Measure

In this paper, we study the correlation between incoming link of users on Twitter, a micro-blogging website online social. Finding the influential user can apply to recommend users to follow their interest’s businesses domain. To analyze and find characteristic of the influential users for applying to improve the performance of recommender system. We use user’s Twitter posts from any solution into predefined business types. In this paper, we propose solution to applied user selection by comparing among three parameters: (1) the number of relevant posts (NumRP) (2) the number of incoming link from business follower (NumUFI) (3) the number of incoming link from every follower (NumTI). Each parameter is ranked and incremental organized into three groups of each parameter: (1) Top-100 (2) Top-200 and (3) Top-300. After that, we applied posts of selected users to build classification model. Comparison between among three user selection parameters and three user groups. From the experimental results, the performance of NumRP yielded the F-measure higher than NumUFI and NumTI respectively. In addition, users who organized into Top-100 user group of each user selection method are influential users.

Download Full-text

Environmental factors prediction in preterm birth using comparison between logistic regression and decision tree methods: an exploratory analysis

10.22541/au.160691771.17181638/v1 ◽

2020 ◽

Author(s):

Rakesh Saroj ◽

Madhu Anand ◽

Neha Kumari

Keyword(s):

Logistic Regression ◽

Preterm Birth ◽

Decision Tree ◽

Model Comparison ◽

Birth Outcome ◽

Influential Factors ◽

Classification Model ◽

Term Birth ◽

Machine Learning Classification ◽

Tree Classifier

Objective The main objective of this paper is to compare the performance of logistic regression and decision tree classification methods and to find the significant environment determinants that causes pre-term birth. Design, setting and population Between 2017 to 2018, 90 pregnant females underwent birth outcome followed by research staff at our institutions, out of those 50 are full-term and 40 are preterm births in this study. Method Before and after feature selection logistic regression and decision tree classifier model has been compared in this dataset and to evaluate the model accuracy. Main outcome measures Preforming the accuracy of machine learning classification model and important factors on pre-term birth. Results: Using chi-square test and find the Area of residence and GSH, MDA, α-HCH, total HCH and total DDT are responsible for the preterm birth. Using the multiple logistic regression, pre term birth was associated with MDA and α-HCH (95% CI 0.04 to 0.48 and 95% CI 0.82 to 0.97). The logistic and decision tree model comparison result shows that logistic regression is better in terms of metrics (precision = 0.92, F1-score = 0.96 and AUROC = 0.97), while decision tree performs the poor (precision = 0.75, F1-score = 0.86 and AUROC = 0.87). Conclusions The logistic regression is accurate model to predict the pre-term as compare to decision tree method. The variables like α-HCH , total HCH and MDA (Malondialdehyde) are the most influential factors for preterm birth.

Download Full-text