Assessing Electrocardiogram and Respiratory Signal Quality of a Wearable Device (SensEcho): Semisupervised Machine Learning-Based Validation Study

Haoran Xu; Wei Yan; Ke Lan; Chenbin Ma; Di Wu; Anshuo Wu; Zhicheng Yang; Jiachen Wang; Yaning Zang; Muyang Yan; Zhengbo Zhang

doi:10.2196/25415

Assessing Electrocardiogram and Respiratory Signal Quality of a Wearable Device (SensEcho): Semisupervised Machine Learning-Based Validation Study

JMIR mhealth and uhealth ◽

10.2196/25415 ◽

2021 ◽

Vol 9 (8) ◽

pp. e25415

Author(s):

Haoran Xu ◽

Wei Yan ◽

Ke Lan ◽

Chenbin Ma ◽

Di Wu ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Wearable Device ◽

Gradient Boosting ◽

Support Vector ◽

False Alarms ◽

Signal Quality ◽

Ventricular Premature Beat ◽

Extreme Gradient Boosting ◽

Test Sets

Background With the development and promotion of wearable devices and their mobile health (mHealth) apps, physiological signals have become a research hotspot. However, noise is complex in signals obtained from daily lives, making it difficult to analyze the signals automatically and resulting in a high false alarm rate. At present, screening out the high-quality segments of the signals from huge-volume data with few labels remains a problem. Signal quality assessment (SQA) is essential and is able to advance the valuable information mining of signals. Objective The aims of this study were to design an SQA algorithm based on the unsupervised isolation forest model to classify the signal quality into 3 grades: good, acceptable, and unacceptable; validate the algorithm on labeled data sets; and apply the algorithm on real-world data to evaluate its efficacy. Methods Data used in this study were collected by a wearable device (SensEcho) from healthy individuals and patients. The observation windows for electrocardiogram (ECG) and respiratory signals were 10 and 30 seconds, respectively. In the experimental procedure, the unlabeled training set was used to train the models. The validation and test sets were labeled according to preset criteria and used to evaluate the classification performance quantitatively. The validation set consisted of 3460 and 2086 windows of ECG and respiratory signals, respectively, whereas the test set was made up of 4686 and 3341 windows of signals, respectively. The algorithm was also compared with self-organizing maps (SOMs) and 4 classic supervised models (logistic regression, random forest, support vector machine, and extreme gradient boosting). One case validation was illustrated to show the application effect. The algorithm was then applied to 1144 cases of ECG signals collected from patients and the detected arrhythmia false alarms were calculated. Results The quantitative results showed that the ECG SQA model achieved 94.97% and 95.58% accuracy on the validation and test sets, respectively, whereas the respiratory SQA model achieved 81.06% and 86.20% accuracy on the validation and test sets, respectively. The algorithm was superior to SOM and achieved moderate performance when compared with the supervised models. The example case showed that the algorithm was able to correctly classify the signal quality even when there were complex pathological changes in the signals. The algorithm application results indicated that some specific types of arrhythmia false alarms such as tachycardia, atrial premature beat, and ventricular premature beat could be significantly reduced with the help of the algorithm. Conclusions This study verified the feasibility of applying the anomaly detection unsupervised model to SQA. The application scenarios include reducing the false alarm rate of the device and selecting signal segments that can be used for further research.

Download Full-text

Assessing Electrocardiogram and Respiratory Signal Quality of a Wearable Device (SensEcho): Semisupervised Machine Learning-Based Validation Study (Preprint)

10.2196/preprints.25415 ◽

2020 ◽

Author(s):

Haoran Xu ◽

Wei Yan ◽

Ke Lan ◽

Chenbin Ma ◽

Di Wu ◽

...

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Wearable Device ◽

Gradient Boosting ◽

Support Vector ◽

False Alarms ◽

Signal Quality ◽

Ventricular Premature Beat ◽

Extreme Gradient Boosting ◽

Test Sets

BACKGROUND With the development and promotion of wearable devices and their mobile health (mHealth) apps, physiological signals have become a research hotspot. However, noise is complex in signals obtained from daily lives, making it difficult to analyze the signals automatically and resulting in a high false alarm rate. At present, screening out the high-quality segments of the signals from huge-volume data with few labels remains a problem. Signal quality assessment (SQA) is essential and is able to advance the valuable information mining of signals. OBJECTIVE The aims of this study were to design an SQA algorithm based on the unsupervised isolation forest model to classify the signal quality into 3 grades: good, acceptable, and unacceptable; validate the algorithm on labeled data sets; and apply the algorithm on real-world data to evaluate its efficacy. METHODS Data used in this study were collected by a wearable device (SensEcho) from healthy individuals and patients. The observation windows for electrocardiogram (ECG) and respiratory signals were 10 and 30 seconds, respectively. In the experimental procedure, the unlabeled training set was used to train the models. The validation and test sets were labeled according to preset criteria and used to evaluate the classification performance quantitatively. The validation set consisted of 3460 and 2086 windows of ECG and respiratory signals, respectively, whereas the test set was made up of 4686 and 3341 windows of signals, respectively. The algorithm was also compared with self-organizing maps (SOMs) and 4 classic supervised models (logistic regression, random forest, support vector machine, and extreme gradient boosting). One case validation was illustrated to show the application effect. The algorithm was then applied to 1144 cases of ECG signals collected from patients and the detected arrhythmia false alarms were calculated. RESULTS The quantitative results showed that the ECG SQA model achieved 94.97% and 95.58% accuracy on the validation and test sets, respectively, whereas the respiratory SQA model achieved 81.06% and 86.20% accuracy on the validation and test sets, respectively. The algorithm was superior to SOM and achieved moderate performance when compared with the supervised models. The example case showed that the algorithm was able to correctly classify the signal quality even when there were complex pathological changes in the signals. The algorithm application results indicated that some specific types of arrhythmia false alarms such as tachycardia, atrial premature beat, and ventricular premature beat could be significantly reduced with the help of the algorithm. CONCLUSIONS This study verified the feasibility of applying the anomaly detection unsupervised model to SQA. The application scenarios include reducing the false alarm rate of the device and selecting signal segments that can be used for further research.

Download Full-text

Effective Smoke Detection Using Spatial-Temporal Energy and Weber Local Descriptors in Three Orthogonal Planes (WLD-TOP)

Journal of Computer Science and Technology ◽

10.24215/16666038.18.e05 ◽

2018 ◽

Vol 18 (01) ◽

pp. e05 ◽

Cited By ~ 1

Author(s):

John Adedapo Ojo ◽

Jamiu Alabi Oladosu

Keyword(s):

False Alarm ◽

False Alarm Rate ◽

Detection Rate ◽

Robot Vision ◽

Fire Detection ◽

Support Vector ◽

False Alarms ◽

High Detection Rate ◽

Local Descriptor ◽

Video Frames

Video-based fire detection (VFD) technologies have received significant attention from both academic and industrial communities recently. However, existing VFD approaches are still susceptible to false alarms due to changes in illumination, camera noise, variability of shape, motion, colour, irregular patterns of smoke and flames, modelling and training inaccuracies. Hence, this work aimed at developing a VSD system that will have a high detection rate, low false-alarm rate and short response time. Moving blocks in video frames were segmented and analysed in HSI colour space, and wavelet energy analysis of the smoke candidate blocks was performed. In addition, Dynamic texture descriptors were obtained using Weber Local Descriptor in Three Orthogonal Planes (WLD-TOP). These features were combined and used as inputs to Support Vector Classifier with radial based kernel function, while post-processing stage employs temporal image filtering to reduce false alarm. The algorithm was implemented in MATLAB 8.1.0.604 (R2013a). Accuracy of 99.30%, detection rate of 99.28% and false alarm rate of 0.65% were obtained when tested with some online videos. The output of this work would find applications in early fire detection systems and other applications such as robot vision and automated inspection.

Download Full-text

Realizing Target Detection in SAR Images Based on Multiscale Superpixel Fusion

Sensors ◽

10.3390/s21051643 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1643

Author(s):

Ming Liu ◽

Shichao Chen ◽

Fugang Lu ◽

Mengdao Xing ◽

Jingbiao Wei

Keyword(s):

Synthetic Aperture Radar ◽

False Alarm ◽

False Alarm Rate ◽

Target Detection ◽

Synthetic Aperture ◽

False Alarms ◽

Sar Image ◽

Constant False Alarm Rate ◽

Sar Images ◽

Complex Scenes

For target detection in complex scenes of synthetic aperture radar (SAR) images, the false alarms in the land areas are hard to eliminate, especially for the ones near the coastline. Focusing on the problem, an algorithm based on the fusion of multiscale superpixel segmentations is proposed in this paper. Firstly, the SAR images are partitioned by using different scales of superpixel segmentation. For the superpixels in each scale, the land-sea segmentation is achieved by judging their statistical properties. Then, the land-sea segmentation results obtained in each scale are combined with the result of the constant false alarm rate (CFAR) detector to eliminate the false alarms located on the land areas of the SAR image. In the end, to enhance the robustness of the proposed algorithm, the detection results obtained in different scales are fused together to realize the final target detection. Experimental results on real SAR images have verified the effectiveness of the proposed algorithm.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Establishing a Credit Risk Evaluation System for SMEs Using the Soft Voting Fusion Model

Risks ◽

10.3390/risks9110202 ◽

2021 ◽

Vol 9 (11) ◽

pp. 202

Author(s):

Ge Gao ◽

Hongxin Wang ◽

Pengbin Gao

Keyword(s):

Credit Risk ◽

Evaluation System ◽

Predictive Accuracy ◽

Assessment System ◽

Gradient Boosting ◽

Support Vector ◽

Fusion Model ◽

Light Gradient ◽

Extreme Gradient Boosting ◽

The Government

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.

Download Full-text

Classification of Hot Spots using XGBoost and LightGBM Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e9459.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 722-724

Keyword(s):

Computational Methods ◽

Protein Interactions ◽

Hot Spots ◽

Cell Metabolism ◽

Pearson Correlation ◽

Classification Performance ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting ◽

Hub Proteins

Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs

Download Full-text

HyP-ABC: A Novel Automated Hyper-Parameter Tuning Algorithm Using Evolutionary Optimization

10.36227/techrxiv.14714508.v2 ◽

2021 ◽

Author(s):

Leila Zahedi ◽

Farid Ghareh Mohammadi ◽

M. Hadi Amini

Keyword(s):

Parameter Optimization ◽

Real World ◽

Optimization Problems ◽

State Of The Art ◽

Parameter Tuning ◽

Gradient Boosting ◽

Support Vector ◽

Wide Range ◽

Extreme Gradient Boosting ◽

Art Techniques

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.

Download Full-text

Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2021/5524356 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hengrui Chen ◽

Hong Chen ◽

Ruiyu Zhou ◽

Zhizhen Liu ◽

Xiaoke Sun

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Crash Severity ◽

Apriori Algorithm ◽

Driving Mode ◽

Extreme Gradient Boosting ◽

The Impact

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text