scholarly journals Using Sensor Data to Detect Lameness and Mastitis Treatment Events in Dairy Cows: A Comparison of Classification Models

Sensors ◽  
2020 ◽  
Vol 20 (14) ◽  
pp. 3863 ◽  
Author(s):  
Christian Post ◽  
Christian Rietz ◽  
Wolfgang Büscher ◽  
Ute Müller

The aim of this study was to develop classification models for mastitis and lameness treatments in Holstein dairy cows as the target variables based on continuous data from herd management software with modern machine learning methods. Data was collected over a period of 40 months from a total of 167 different cows with daily individual sensor information containing milking parameters, pedometer activity, feed and water intake, and body weight (in the form of differently aggregated data) as well as the entered treatment data. To identify the most important predictors for mastitis and lameness treatments, respectively, Random Forest feature importance, Pearson’s correlation and sequential forward feature selection were applied. With the selected predictors, various machine learning models such as Logistic Regression (LR), Support Vector Machine (SVM), K-nearest neighbors (KNN), Gaussian Naïve Bayes (GNB), Extra Trees Classifier (ET) and different ensemble methods such as Random Forest (RF) were trained. Their performance was compared using the receiver operator characteristic (ROC) area-under-curve (AUC), as well as sensitivity, block sensitivity and specificity. In addition, sampling methods were compared: Over- and undersampling as compensation for the expected unbalanced training data had a high impact on the ratio of sensitivity and specificity in the classification of the test data, but with regard to AUC, random oversampling and SMOTE (Synthetic Minority Over-sampling) even showed significantly lower values than with non-sampled data. The best model, ET, obtained a mean AUC of 0.79 for mastitis and 0.71 for lameness, respectively, based on testing data from practical conditions and is recommended by us for this type of data, but GNB, LR and RF were only marginally worse, and random oversampling and SMOTE even showed significantly lower values than without sampling. We recommend the use of these models as a benchmark for similar self-learning classification tasks. The classification models presented here retain their interpretability with the ability to present feature importances to the farmer in contrast to the “black box” models of Deep Learning methods.

Animals ◽  
2020 ◽  
Vol 10 (5) ◽  
pp. 771
Author(s):  
Toshiya Arakawa

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


RSC Advances ◽  
2014 ◽  
Vol 4 (106) ◽  
pp. 61624-61630 ◽  
Author(s):  
N. S. Hari Narayana Moorthy ◽  
Silvia A. Martins ◽  
Sergio F. Sousa ◽  
Maria J. Ramos ◽  
Pedro A. Fernandes

Classification models to predict the solvation free energies of organic molecules were developed using decision tree, random forest and support vector machine approaches and with MACCS fingerprints, MOE and PaDEL descriptors.


2021 ◽  
Author(s):  
Qifei Zhao ◽  
Xiaojun Li ◽  
Yunning Cao ◽  
Zhikun Li ◽  
Jixin Fan

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.


Forests ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 118 ◽  
Author(s):  
Viet-Hung Dang ◽  
Nhat-Duc Hoang ◽  
Le-Mai-Duyen Nguyen ◽  
Dieu Tien Bui ◽  
Pijush Samui

This study developed and verified a new hybrid machine learning model, named random forest machine (RFM), for the spatial prediction of shallow landslides. RFM is a hybridization of two state-of-the-art machine learning algorithms, random forest classifier (RFC) and support vector machine (SVM), in which RFC is used to generate subsets from training data and SVM is used to build decision functions for these subsets. To construct and verify the hybrid RFM model, a shallow landslide database of the Lang Son area (northern Vietnam) was prepared. The database consisted of 101 shallow landslide polygons and 14 conditioning factors. The relevance of these factors for shallow landslide susceptibility modeling was assessed using the ReliefF method. Experimental results pointed out that the proposed RFM can help to achieve the desired prediction with an F1 score of roughly 0.96. The performance of the RFM was better than those of benchmark approaches, including the SVM, RFC, and logistic regression. Thus, the newly developed RFM is a promising tool to help local authorities in shallow landslide hazard mitigations.


Author(s):  
Furkan Bilek ◽  
Ferhat Balgetir ◽  
Caner Feyzi Demir ◽  
Gökhan Alkan ◽  
Seda Arslan-Tuncer

Abstract Background and Objective Multiple sclerosis (MS) is a chronic, progressive, and autoimmune disease of the central nervous system (CNS) characterized by inflammation, demyelination, and axonal injury. In patients with newly diagnosed MS (ndMS), ataxia can present either as mild or severe and can be difficult to diagnose in the absence of clinical disability. Such difficulties can be eliminated by using decision support systems supported by machine learning methods. The present study aimed to achieve early diagnosis of ataxia in ndMS patients by using machine learning methods with spatiotemporal parameters. Materials and Methods The prospective study included 32 ndMS patients with an Expanded Disability Status Scale (EDSS) score of≤2.0 and 32 healthy volunteers. A total of 14 parameters were elicited by using a Win-Track platform. The ndMS patients were differentiated from healthy individuals using multiple classifiers including Artificial Neural Network (ANN), Support Vector Machine (SVM), the k-nearest neighbors (K-NN) algorithm, and Decision Tree Learning (DTL). To improve the performance of the classification, a Relief-based feature selection algorithm was applied to select the subset that best represented the whole dataset. Performance evaluation was achieved based on several criteria such as Accuracy (ACC), Sensitivity (SN), Specificity (SP), and Precision (PREC). Results ANN had a higher classification performance compared to other classifiers, whereby it provided an accuracy, sensitivity, and specificity of 89, 87.8, 90.3% with the use of all parameters and provided the values of 93.7, 96.6%, and 91.1% with the use of parameters selected by the Relief algorithm, respectively. Significance To our knowledge, this is the first study of its kind in the literature to investigate the diagnosis of ataxia in ndMS patients by using machine learning methods with spatiotemporal parameters. The proposed method, i. e. Relief-based ANN method, successfully diagnosed ataxia by using a lower number of parameters compared to the numbers of parameters reported in clinical studies, thereby reducing the costs and increasing the performance of the diagnosis. The method also provided higher rates of accuracy, sensitivity, and specificity in the diagnosis of ataxia in ndMS patients compared to other methods. Taken together, these findings indicate that the proposed method could be helpful in the diagnosis of ataxia in minimally impaired ndMS patients and could be a pathfinder for future studies.


2020 ◽  
Vol 9 (1) ◽  
Author(s):  
E. Popoff ◽  
M. Besada ◽  
J. P. Jansen ◽  
S. Cope ◽  
S. Kanters

Abstract Background Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion—to align with current practices—and to determine optimal parameter settings for feature-set generation and ML algorithms. Methods We used abstract and full-text selection data from five large SLRs (n = 3089 to 12,769 abstracts) across a variety of disease areas. Each SLR was split into training and test sets. We developed a multi-step algorithm to categorize each citation into the following categories: included; excluded for each PICOS criterion; or unclassified. We used a bag-of-words approach for feature-set generation and compared machine learning algorithms using support vector machines (SVMs), naïve Bayes (NB), and bagged classification and regression trees (CART) for classification. We also compared alternative training set strategies: using full data versus downsampling (i.e., reducing excludes to balance includes/excludes because machine learning algorithms perform better with balanced data), and using inclusion/exclusion decisions from abstract versus full-text screening. Performance comparisons were in terms of specificity, sensitivity, accuracy, and matching the reason for exclusion. Results The best-fitting model (optimized sensitivity and specificity) was based on the SVM algorithm using training data based on full-text decisions, downsampling, and excluding words occurring fewer than five times. The sensitivity and specificity of this model ranged from 94 to 100%, and 54 to 89%, respectively, across the five SLRs. On average, 75% of excluded citations were excluded with a reason and 83% of these citations matched the reviewers’ original reason for exclusion. Sensitivity significantly improved when both downsampling and abstract decisions were used. Conclusions ML algorithms can improve the efficiency of the SLR process and the proposed algorithms could reduce the workload of a second reviewer by identifying exclusions with a relevant PICOS reason, thus aligning with HTA guidance. Downsampling can be used to improve study selection, and improvements using full-text exclusions have implications for a learn-as-you-go approach.


SPE Journal ◽  
2020 ◽  
Vol 25 (03) ◽  
pp. 1241-1258 ◽  
Author(s):  
Ruizhi Zhong ◽  
Raymond L. Johnson ◽  
Zhongwei Chen

Summary Accurate coal identification is critical in coal seam gas (CSG) (also known as coalbed methane or CBM) developments because it determines well completion design and directly affects gas production. Density logging using radioactive source tools is the primary tool for coal identification, adding well trips to condition the hole and additional well costs for logging runs. In this paper, machine learning methods are applied to identify coals from drilling and logging-while-drilling (LWD) data to reduce overall well costs. Machine learning algorithms include logistic regression (LR), support vector machine (SVM), artificial neural network (ANN), random forest (RF), and extreme gradient boosting (XGBoost). The precision, recall, and F1 score are used as evaluation metrics. Because coal identification is an imbalanced data problem, the performance on the minority class (i.e., coals) is limited. To enhance the performance on coal prediction, two data manipulation techniques [naive random oversampling (NROS) technique and synthetic minority oversampling technique (SMOTE)] are separately coupled with machine learning algorithms. Case studies are performed with data from six wells in the Surat Basin, Australia. For the first set of experiments (single-well experiments), both the training data and test data are in the same well. The machine learning methods can identify coal pay zones for sections with poor or missing logs. It is found that rate of penetration (ROP) is the most important feature. The second set of experiments (multiple-well experiments) uses the training data from multiple nearby wells, which can predict coal pay zones in a new well. The most important feature is gamma ray. After placing slotted casings, all wells have coal identification rates greater than 90%, and three wells have coal identification rates greater than 99%. This indicates that machine learning methods (either XGBoost or ANN/RF with NROS/SMOTE) can be an effective way to identify coal pay zones and reduce coring or logging costs in CSG developments.


Author(s):  
Nelson Yego ◽  
Juma Kasozi ◽  
Joseph Nkrunziza

The role of insurance in financial inclusion as well as in economic growth is immense. However, low uptake seems to impede the growth of the sector hence the need for a model that robustly predicts uptake of insurance among potential clients. In this research, we compared the performances of eight (8) machine learning models in predicting the uptake of insurance. The classifiers considered were Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, K Nearest Neighbors, Decision Tree, Random Forest, Gradient Boosting Machines and Extreme Gradient boosting. The data used in the classification was from the 2016 Kenya FinAccess Household Survey. Comparison of performance was done for both upsampled and downsampled data due to data imbalance. For upsampled data, Random Forest classifier showed highest accuracy and precision compared to other classifiers but for down sampled data, gradient boosting was optimal. It is noteworthy that for both upsampled and downsampled data, tree-based classifiers were more robust than others in insurance uptake prediction. However, in spite of hyper-parameter optimization, the area under receiver operating characteristic curve remained highest for Random Forest as compared to other tree-based models. Also, the confusion matrix for Random Forest showed least false positives, and highest true positives hence could be construed as the most robust model for predicting the insurance uptake. Finally, the most important feature in predicting uptake was having a bank product hence bancassurance could be said to be a plausible channel of distribution of insurance products.


2021 ◽  
Vol 12 ◽  
Author(s):  
Runbin Sun ◽  
Haokai Zhao ◽  
Shuzhen Huang ◽  
Ran Zhang ◽  
Zhenyao Lu ◽  
...  

Liver has an ability to regenerate itself in mammals, whereas the mechanism has not been fully explained. Here we used a GC/MS-based metabolomic method to profile the dynamic endogenous metabolic change in the serum of C57BL/6J mice at different times after 2/3 partial hepatectomy (PHx), and nine machine learning methods including Least Absolute Shrinkage and Selection Operator Regression (LASSO), Partial Least Squares Regression (PLS), Principal Components Regression (PCR), k-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest (RF), eXtreme Gradient Boosting (xgbDART), Neural Network (NNET) and Bayesian Regularized Neural Network (BRNN) were used for regression between the liver index and metabolomic data at different stages of liver regeneration. We found a tree-based random forest method that had the minimum average Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and the maximum R square (R2) and is time-saving. Furthermore, variable of importance in the project (VIP) analysis of RF method was performed and metabolites with VIP ranked top 20 were selected as the most critical metabolites contributing to the model. Ornithine, phenylalanine, 2-hydroxybutyric acid, lysine, etc. were chosen as the most important metabolites which had strong correlations with the liver index. Further pathway analysis found Arginine biosynthesis, Pantothenate and CoA biosynthesis, Galactose metabolism, Valine, leucine and isoleucine degradation were the most influenced pathways. In summary, several amino acid metabolic pathways and glucose metabolism pathway were dynamically changed during liver regeneration. The RF method showed advantages for predicting the liver index after PHx over other machine learning methods used and a metabolic clock containing four metabolites is established to predict the liver index during liver regeneration.


Sign in / Sign up

Export Citation Format

Share Document