scholarly journals Multiclass Classification of Hepatic Anomalies with Dielectric Properties: From Phantom Materials to Rat Hepatic Tissues

Sensors ◽  
2020 ◽  
Vol 20 (2) ◽  
pp. 530 ◽  
Author(s):  
Tuba Yilmaz

Open-ended coaxial probes can be used as tissue characterization devices. However, the technique suffers from a high error rate. To improve this technology, there is a need to decrease the measurement error which is reported to be more than 30% for an in vivo measurement setting. This work investigates the machine learning (ML) algorithms’ ability to decrease the measurement error of open-ended coaxial probe techniques to enable tissue characterization devices. To explore the potential of this technique as a tissue characterization device, performances of multiclass ML algorithms on collected in vivo rat hepatic tissue and phantom dielectric property data were evaluated. Phantoms were used for investigating the potential of proliferating the data set due to difficulty of in vivo data collection from tissues. The dielectric property measurements were collected from 16 rats with hepatic anomalies, 8 rats with healthy hepatic tissues, and in house phantoms. Three ML algorithms, k-nearest neighbors (kNN), logistic regression (LR), and random forests (RF) were used to classify the collected data. The best performance for the classification of hepatic tissues was obtained with 76% accuracy using the LR algorithm. The LR algorithm performed classification with over 98% accuracy within the phantom data and the model generalized to in vivo dielectric property data with 48% accuracy. These findings indicate first, linear models, such as logistic regression, perform better on dielectric property data sets. Second, ML models fitted to the data collected from phantom materials can partly generalize to in vivo dielectric property data due to the discrepancy between dielectric property variability.

2013 ◽  
Vol 9 (2) ◽  
pp. 227-262 ◽  
Author(s):  
Daphne Theijssen ◽  
Louis ten Bosch ◽  
Lou Boves ◽  
Bert Cranen ◽  
Hans van Halteren

AbstractIn existing research on syntactic alternations such as the dative alternation, (give her the apple vs. give the apple to her), the linguistic data is often analysed with the help of logistic regression models. In this article, we evaluate the use of logistic regression for this type of research, and present two different approaches: Bayesian Networks and Memory-based learning. For the Bayesian Network, we use the higher-level semantic features suggested in the literature, while we limit ourselves to lexical items in the memory-based approach. We evaluate the suitability of the three approaches by applying them to a large data set (>11,000 instances) extracted from the British National Corpus, and comparing their quality in terms of classification accuracy, their interpretability in the context of linguistic research, and their actual classification of individual cases. Our main finding is that the classifications are very similar across the three approaches, also when employing lexical items instead of the higher-level features, because most of the alternation is determined by the verb and the length of the two objects (here: her and the apple).


2013 ◽  
Vol 35 (1) ◽  
pp. 98
Author(s):  
Angela Radünz Lazzari

Air pollution is a risk factor for the population health. Its harmful effects on the population are observed even when the atmospheric pollutants are within the parameters set out in specific legislation, and they develop mainly through respiratory diseases. The aim of this study was to analyze the relationship between the concentrations of air pollutants and the incidence of respiratory diseases in the city of Porto Alegre, in 2005 and 2006. Applied multiple linear regression analysis, ordinal logistic regression and generalized linear models were used in the work. The results show good adjustment by the three techniques. The ordinal logistic regression detected only positive influence of air temperature and relative humidity in hospitalizations for respiratory diseases. Multiple linear regression related negatively hospitalizations with meteorological variables and positively with the particulate matter (PM10). The generalized linear model detected negative influence of meteorological variables and positive of pollutants, tropospheric ozone (O3) and PM10 in hospitalizations. Comparing the three statistical techniques to analyze the same data set, it can be concluded that all of them had a model with good fit to the data, but the technique of generalized linear models showed higher sensitivity in capturing the influence of pollutants, except in ordinal logistic regression and multiple linear regression.


2017 ◽  
Vol 26 (06) ◽  
pp. 1750019 ◽  
Author(s):  
Jale Bektas ◽  
Turgay Ibrikci ◽  
Ismail Turkay Ozcan

Coronary Artery Disease (CAD) is very common among the major types of cardiovascular diseases, and there are several studies created with different features including data that is collected from patients for timely diagnosis of CAD. In this study, a dataset with 21 features have been used, and a risk score prediction system has been proposed. The patients were divided into four groups. To determine the effective features of CAD dataset; t-test and Relief-f methods on Logistic Regression Analysis (LRA); Relief-f on Neural Network (NN) feature selection methods were utilized. Sampling methods were used to improve imbalanced form of 4-classed dataset, and the effects of sampling methods were evaluated. Using NN with oversampling and Relief-f feature selection method; the results before the preprocess operations were detected as follows; 72.3% accuracy; after the operations, 84.1% accuracy were achieved with 0.84 sensitivity 0.94 specificity. These statistics obtained from the experiment, by detailed analysis, are the best ones for the CAD data set in this study. Using the feature selection and the sampling methods with the NN substantially improve the prediction accuracy as well as the other metrics. This suggests that these preprocessing methods and the NN may be used together to construct for prediction of the 4-classed imbalanced medical datasets.


2015 ◽  
Vol 27 (11) ◽  
pp. 2411-2422 ◽  
Author(s):  
Charles K. Fisher ◽  
Pankaj Mehta

Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors—priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.


2000 ◽  
Vol 90 (2) ◽  
pp. 108-113 ◽  
Author(s):  
E. D. De Wolf ◽  
L. J. Francl

Tan spot and Stagonospora blotch of hard red spring wheat served as a model system for evaluating disease forecasts by artificial neural networks. Pathogen infection periods on susceptible wheat plants were measured in the field from 1993 to 1998, and incidence data were merged with 24-h summaries of accumulated growing degree days, temperature, relative humidity, precipitation, and leaf wetness duration. The resulting data set of 202 discrete periods was randomly assigned to 10 modeldevelopment or -validation (n = 50) data sets. Backpropagation neural networks, general regression neural networks, logistic regression, and parametric and nonparametric methods of discriminant analysis were chosen for comparison. Mean validation classification of tan spot incidence was between 71% for logistic regression and 76% for backpropagation models. No significant difference was found between methods of modeling tan spot infection periods. Mean validation prediction accuracy of Stagonospora blotch incidence was 86 and 81% for backpropagation and logistic regression, respectively. Prediction accuracies of other modeling methods were ≤78% and were significantly different (P = 0.01) from backpropagation, but not logistic regression, results. The best backpropagation models of tan spot and Stagonospora blotch incidences correctly classified 82 and 84% of validation cases, respectively. High classification accuracy and consistently good performance demonstrate the applicability of neural network technology to plant disease forecasting.


2017 ◽  
Vol 12 (2) ◽  
pp. 243-264 ◽  
Author(s):  
Rahul Kumar ◽  
Pradip Kumar Bala

Purpose Collaborative filtering (CF), one of the most popular recommendation techniques, is based on the principle of word-of-mouth communication between other like-minded users. The process of identifying these like-minded or similar users remains crucial for a CF framework. Conventionally, a neighbor is the one among the similar users who has rated the item under consideration. To select neighbors by the existing practices, their similarity deteriorates as many similar users might not have rated the item under consideration. This paper aims to address the drawback in the existing CF method where “not-so-similar” or “weak” neighbors are selected. Design/methodology/approach The new approach proposed here selects neighbors only on the basis of highest similarity coefficient, irrespective of rating the item under consideration. Further, to predict missing ratings by some neighbors for the item under consideration, ordinal logistic regression based on item–item similarity is used here. Findings Experiments using the MovieLens (ml-100) data set prove the efficacy of the proposed approach on different performance evaluation metrics such as accuracy and classification metrics. Apart from higher prediction quality, coverage values are also at par with the literature. Originality/value This new approach gets its motivation from the principle of the CF method to rely on the opinion of the closest neighbors, which seems more meaningful than trusting “not-so-similar” or “weak” neighbors. The static nature of the neighborhood addresses the scalability issue of CF. Use of ordinal logistic regression as a prediction technique addresses the statistical inappropriateness of other linear models to make predictions for ordinal scale ratings data.


2018 ◽  
Vol 3 (1) ◽  
pp. 18 ◽  
Author(s):  
Alfensi Faruk ◽  
Endro Setyo Cahyono

Machine learning (ML) is a subject that focuses on the data analysis using various statistical tools and learning processes in order to gain more knowledge from the data. The objective of this research was to apply one of the ML techniques on the low birth weight (LBW) data in Indonesia. This research conducts two ML tasks; including prediction and classification. The binary logistic regression model was firstly employed on the train and the test data. Then; the random approach was also applied to the data set. The results showed that the binary logistic regression had a good performance for prediction; but it was a poor approach for classification. On the other hand; random forest approach has a very good performance for both prediction and classification of the LBW data set


Author(s):  
H. Ahmed ◽  
M. B. Mohammed ◽  
I. A. Baba

The logistic regression (LR) and Multi-Layer (MLP) are used to handle regression analysis when the dependent response variable is categorical. Therefore, this study assesses the performance of LR and MLP in terms of classification of object/observations into identified component/groups. A data set consists of 553 cases of diabetes were collected at Federal Medical Center, . The variables measured: Age(years), Mass of a patient(kg/meters), glucose level (plasma glucose concentration, a 2-hour in an oral glucose tolerance test), pressure (Diastolic blood pressure ), insulin (2-hour serum insulin mu U/ml) and class variable (0 or 1) treating 0 as false or negative and 1 treated as true or positive test for diabetes. The method used in the study is Logistic regression analysis and the multi-Layer , a type of Artificial Neural Network, confusion matrix, classification, network algorithm and SPSS version 21 for Windows 10.1. The result of the study showed that LP classifies diabetic patients correctly with 91.8% accuracy. it classifies non-diabetic patients with 89.1% accuracy. MLP classifies diabetic patients with 88.6% accuracy while it classifies non-diabetic patients with 93.2% classification accuracy. Overall, MLP classifies better with 91% accuracy while LR classifies with 90.6% accuracy. This study complements other where MLP, a type Artificial neural network classifies and predicts better than other non-neural network classifiers.


Author(s):  
Supun Nakandala ◽  
Marta M. Jankowska ◽  
Fatima Tuz-Zahra ◽  
John Bellettiere ◽  
Jordan A. Carlson ◽  
...  

Background: Machine learning has been used for classification of physical behavior bouts from hip-worn accelerometers; however, this research has been limited due to the challenges of directly observing and coding human behavior “in the wild.” Deep learning algorithms, such as convolutional neural networks (CNNs), may offer better representation of data than other machine learning algorithms without the need for engineered features and may be better suited to dealing with free-living data. The purpose of this study was to develop a modeling pipeline for evaluation of a CNN model on a free-living data set and compare CNN inputs and results with the commonly used machine learning random forest and logistic regression algorithms. Method: Twenty-eight free-living women wore an ActiGraph GT3X+ accelerometer on their right hip for 7 days. A concurrently worn thigh-mounted activPAL device captured ground truth activity labels. The authors evaluated logistic regression, random forest, and CNN models for classifying sitting, standing, and stepping bouts. The authors also assessed the benefit of performing feature engineering for this task. Results: The CNN classifier performed best (average balanced accuracy for bout classification of sitting, standing, and stepping was 84%) compared with the other methods (56% for logistic regression and 76% for random forest), even without performing any feature engineering. Conclusion: Using the recent advancements in deep neural networks, the authors showed that a CNN model can outperform other methods even without feature engineering. This has important implications for both the model’s ability to deal with the complexity of free-living data and its potential transferability to new populations.


Author(s):  
I Ketut Putu Suniantara ◽  
I Gede Eka Wiantara Putra ◽  
Gede Suwardika

  Baby's birth weight is influenced by characteristics of pregnant women such as age, parity, education level, pregnancy visit, and gestational age. Classification of the birth weight of a baby is grouped into several groups, namely low birth weight babies, normal baby weight and excess baby weight. The classification method with ordinal logistic regression provides an unstable parameter estimation, which means that if there is a change in the data set causes a significant change in the model. So that to obtain a stable parameter estimation in the ordinal logistic regression model is used aggregating (bagging) bootstrap approach. This study aims to improve the classification of ordinal logistic regression by using bagging on a baby's birth weight. The classification results with bagging ordinal logistic regression were able to reduce classification errors by 20.237% with 76.67% classification accuracy


Sign in / Sign up

Export Citation Format

Share Document