Comparison of machine learning techniques to handle imbalanced COVID-19 CBC datasets

PeerJ Computer Science ◽

10.7717/peerj-cs.670 ◽

2021 ◽

Vol 7 ◽

pp. e670

Author(s):

Marcio Dorn ◽

Bruno Iochins Grisci ◽

Pedro Henrique Narloch ◽

Bruno César Feltes ◽

Eduardo Avila ◽

...

Keyword(s):

Machine Learning ◽

Sampling Methods ◽

Complete Blood Count ◽

Low Cost ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Learning Techniques ◽

Comparison Of The Results ◽

Aid Decision

The Coronavirus pandemic caused by the novel SARS-CoV-2 has significantly impacted human health and the economy, especially in countries struggling with financial resources for medical testing and treatment, such as Brazil’s case, the third most affected country by the pandemic. In this scenario, machine learning techniques have been heavily employed to analyze different types of medical data, and aid decision making, offering a low-cost alternative. Due to the urgency to fight the pandemic, a massive amount of works are applying machine learning approaches to clinical data, including complete blood count (CBC) tests, which are among the most widely available medical tests. In this work, we review the most employed machine learning classifiers for CBC data, together with popular sampling methods to deal with the class imbalance. Additionally, we describe and critically analyze three publicly available Brazilian COVID-19 CBC datasets and evaluate the performance of eight classifiers and five sampling techniques on the selected datasets. Our work provides a panorama of which classifier and sampling methods provide the best results for different relevant metrics and discuss their impact on future analyses. The metrics and algorithms are introduced in a way to aid newcomers to the field. Finally, the panorama discussed here can significantly benefit the comparison of the results of new ML algorithms.

Download Full-text

Prediction of Clinical Risk Factors of Diabetes Using Multiple Machine Learning Techniques Resolving Class Imbalance

2020 23rd International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccit51783.2020.9392694 ◽

2020 ◽

Author(s):

Kazi Amit Hasan ◽

Md. Al Mehedi Hasan

Keyword(s):

Machine Learning ◽

Risk Factors ◽

Class Imbalance ◽

Clinical Risk Factors ◽

Machine Learning Techniques ◽

Clinical Risk ◽

Learning Techniques

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Implementation of machine learning techniques to control the speed of the vehicles

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.5.21179 ◽

2018 ◽

Vol 7 (4.5) ◽

pp. 654

Author(s):

M. S. Satyanarayana ◽

Aruna T.M ◽

Divyaraj G.N

Keyword(s):

Machine Learning ◽

Low Cost ◽

Machine Learning Techniques ◽

New Technique ◽

Vehicle Speed ◽

Learning Techniques ◽

Traffic Awareness ◽

Alarm Sound ◽

The Government ◽

A New Technique

Accidents have become major issue in Developing countries like India now a day. As per the Surveys 60% of the accidents are happening due to over speed. Though the government has taken so many initiatives like Traffic Awareness & Driving Awareness Week etc.., but still the percentage of accidents are not getting reduced. In this paper a new technique has been introduced to reduce the percentage of accidents. The new technique is implemented using the concept of Machine Learning [1]. The Machine Learning based systems can be implemented in all vehicles to avoid the accidents at low cost [1]. The main objective of this system is to calculate the speed of the vehicle at three various locations based on the place where the vehicle speed must be controlled and if the speed is greater than the designated speed in that road then the vehicle automatically detects the problem and same will be intimated to the driver to control the speed of the vehicle. If the speed is less or equal to the designated speed in that road then the vehicle will be passed without any disturbance. The system will be giving beep sound along with color indication to driver in each and every scenario. The other option implemented in this system is if the driver is driving the vehicle in the night and if he feel drowsy the system detects it immediately and alarm sound will be initiated to wake up the driver. This system though it won’t avoid 100% accidents at least it will reduce the percentage of accidents. This system is not only to avoid accidents it will also intelligently control the speed of the vehicles and creates awareness amongst the drivers.

Download Full-text

Machine Learning Frameworks in Cancer Detection

E3S Web of Conferences ◽

10.1051/e3sconf/202129701073 ◽

2021 ◽

Vol 297 ◽

pp. 01073

Author(s):

Sabyasachi Pramanik ◽

K. Martin Sagayam ◽

Om Prakash Jena

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Cancer Development ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques ◽

Fact Finding ◽

Risk Of Cancer

Cancer has been described as a diverse illness with several distinct subtypes that may occur simultaneously. As a result, early detection and forecast of cancer types have graced essentially in cancer fact-finding methods since they may help to improve the clinical treatment of cancer survivors. The significance of categorizing cancer suffers into higher or lower-threat categories has prompted numerous fact-finding associates from the bioscience and genomics field to investigate the utilization of machine learning (ML) algorithms in cancer diagnosis and treatment. Because of this, these methods have been used with the goal of simulating the development and treatment of malignant diseases in humans. Furthermore, the capacity of machine learning techniques to identify important characteristics from complicated datasets demonstrates the significance of these technologies. These technologies include Bayesian networks and artificial neural networks, along with a number of other approaches. Decision Trees and Support Vector Machines which have already been extensively used in cancer research for the creation of predictive models, also lead to accurate decision making. The application of machine learning techniques may undoubtedly enhance our knowledge of cancer development; nevertheless, a sufficient degree of validation is required before these approaches can be considered for use in daily clinical practice. An overview of current machine learning approaches utilized in the simulation of cancer development is presented in this paper. All of the supervised machine learning approaches described here, along with a variety of input characteristics and data samples, are used to build the prediction models. In light of the increasing trend towards the use of machine learning methods in biomedical research, we offer the most current papers that have used these approaches to predict risk of cancer or patient outcomes in order to better understand cancer.

Download Full-text

Machine learning for metagenomics: methods and tools

Metagenomics ◽

10.1515/metgen-2016-0001 ◽

2017 ◽

Vol 1 (1) ◽

Cited By ~ 15

Author(s):

Hayssam Soueidan ◽

Macha Nikolski

Keyword(s):

Machine Learning ◽

Gene Prediction ◽

Full Range ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Unified Framework ◽

Comparative Metagenomics ◽

Learning Techniques ◽

Ngs Data ◽

Modern Machine

AbstractOwing to the complexity and variability of metagenomic studies, modern machine learning approaches have seen increased usage to answer a variety of question encompassing the full range of metagenomic NGS data analysis.We review here the contribution of machine learning techniques for the field of metagenomics, by presenting known successful approaches in a unified framework. This review focuses on five important metagenomic problems:OTU-clustering, binning, taxonomic proffiing and assignment, comparative metagenomics and gene prediction. For each of these problems, we identify the most prominent methods, summarize the machine learning approaches used and put them into perspective of similar methods.We conclude our review looking further ahead at the challenge posed by the analysis of interactions within microbial communities and different environments, in a field one could call “integrative metagenomics”.

Download Full-text

Explore the Use of Handwriting Information and Machine Learning Techniques in Evaluating Mental Workload

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch072 ◽

2020 ◽

pp. 1423-1439

Author(s):

Zhiming Wu ◽

Tao Lin ◽

Ningjiu Tang

Keyword(s):

Machine Learning ◽

Interaction Design ◽

Mental Stress ◽

Mental Workload ◽

Low Cost ◽

Research Question ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Techniques ◽

Psychological Evidence

Mental workload is considered one of the most important factors in interaction design and how to detect a user's mental workload during tasks is still an open research question. Psychological evidence has already attributed a certain amount of variability and “drift” in an individual's handwriting pattern to mental stress, but this phenomenon has not been explored adequately. The intention of this paper is to explore the possibility of evaluating mental workload with handwriting information by machine learning techniques. Machine learning techniques such as decision trees, support vector machine (SVM), and artificial neural network were used to predict mental workload levels in the authors' research. Results showed that it was possible to make prediction of mental workload levels automatically based on handwriting patterns with relatively high accuracy, especially on patterns of children. In addition, the proposed approach is attractive because it requires no additional hardware, is unobtrusive, is adaptable to individual users, and is of very low cost.

Download Full-text

Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning Techniques

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch009 ◽

2012 ◽

pp. 155-165

Author(s):

S. Prasanthi ◽

S.Durga Bhavani ◽

T. Sobha Rani ◽

Raju S. Bapi

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Kinase Inhibitors ◽

Kinase Inhibitor ◽

Classification Problem ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Decision Tree Classifier ◽

Data Set ◽

Learning Techniques

Vast majority of successful drugs or inhibitors achieve their activity by binding to, and modifying the activity of a protein leading to the concept of druggability. A target protein is druggable if it has the potential to bind the drug-like molecules. Hence kinase inhibitors need to be studied to understand the specificity of a kinase inhibitor in choosing a particular kinase target. In this paper we focus on human kinase drug target sequences since kinases are known to be potential drug targets. Also we do a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand space in future. The identification of druggable kinases is treated as a classification problem in which druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set. The classification problem is addressed using machine learning techniques like support vector machine (SVM) and decision tree (DT) and using sequence-specific features. One of the challenges of this classification problem is due to the unbalanced data with only 48 druggable kinases available against 509 non-drugggable kinases present at Uniprot. The accuracy of the decision tree classifier obtained is 57.65 which is not satisfactory. A two-tier architecture of decision trees is carefully designed such that recognition on the non-druggable dataset also gets improved. Thus the overall model is shown to achieve a final performance accuracy of 88.37. To the best of our knowledge, kinase druggability prediction using machine learning approaches has not been reported in literature.

Download Full-text

Overview of Machine Learning Approaches for Wireless Communication

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch069 ◽

2021 ◽

pp. 1579-1597

Author(s):

Tolga Ensari ◽

Melike Günay ◽

Yağız Nalçakan ◽

Eyyüp Yildiz

Keyword(s):

Machine Learning ◽

Wireless Communications ◽

Smart Phones ◽

Machine Learning Techniques ◽

Learning Approaches ◽

Research Areas ◽

Learning Techniques ◽

Recent Developments ◽

Security And Reliability ◽

Day By Day

Machine learning is one of the most popular research areas, and it is commonly used in wireless communications and networks. Security and fast communication are among of the key requirements for next generation wireless networks. Machine learning techniques are getting more important day-by-day since the types, amount, and structure of data is continuously changing. Recent developments in smart phones and other devices like drones, wearable devices, machines with sensors need reliable communication within internet of things (IoT) systems. For this purpose, artificial intelligence can increase the security and reliability and manage the data that is generated by the wireless systems. In this chapter, the authors investigate several machine learning techniques for wireless communications including deep learning, which represents a branch of artificial neural networks.

Download Full-text

A Machine Learning View on Momentum and Reversal Trading

Algorithms ◽

10.3390/a11110170 ◽

2018 ◽

Vol 11 (11) ◽

pp. 170 ◽

Cited By ~ 2

Author(s):

Zhixi Li ◽

Vincent Tam

Keyword(s):

Neural Network ◽

Machine Learning ◽

Stock Market ◽

Short Term Memory ◽

Predictive Ability ◽

Trading Strategies ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Learning Techniques

Momentum and reversal effects are important phenomena in stock markets. In academia, relevant studies have been conducted for years. Researchers have attempted to analyze these phenomena using statistical methods and to give some plausible explanations. However, those explanations are sometimes unconvincing. Furthermore, it is very difficult to transfer the findings of these studies to real-world investment trading strategies due to the lack of predictive ability. This paper represents the first attempt to adopt machine learning techniques for investigating the momentum and reversal effects occurring in any stock market. In the study, various machine learning techniques, including the Decision Tree (DT), Support Vector Machine (SVM), Multilayer Perceptron Neural Network (MLP), and Long Short-Term Memory Neural Network (LSTM) were explored and compared carefully. Several models built on these machine learning approaches were used to predict the momentum or reversal effect on the stock market of mainland China, thus allowing investors to build corresponding trading strategies. The experimental results demonstrated that these machine learning approaches, especially the SVM, are beneficial for capturing the relevant momentum and reversal effects, and possibly building profitable trading strategies. Moreover, we propose the corresponding trading strategies in terms of market states to acquire the best investment returns.

Download Full-text

Malicious web domain identification using online credibility and performance data by considering the class imbalance issue

Industrial Management & Data Systems ◽

10.1108/imds-02-2018-0072 ◽

2019 ◽

Vol 119 (3) ◽

pp. 676-696 ◽

Cited By ~ 5

Author(s):

Zhongyi Hu ◽

Raymond Chiong ◽

Ilung Pranata ◽

Yukun Bao ◽

Yuqing Lin

Keyword(s):

Machine Learning ◽

Class Imbalance ◽

Performance Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

Domain Identification ◽

Learning Techniques ◽

And Performance

Purpose Malicious web domain identification is of significant importance to the security protection of internet users. With online credibility and performance data, the purpose of this paper to investigate the use of machine learning techniques for malicious web domain identification by considering the class imbalance issue (i.e. there are more benign web domains than malicious ones). Design/methodology/approach The authors propose an integrated resampling approach to handle class imbalance by combining the synthetic minority oversampling technique (SMOTE) and particle swarm optimisation (PSO), a population-based meta-heuristic algorithm. The authors use the SMOTE for oversampling and PSO for undersampling. Findings By applying eight well-known machine learning classifiers, the proposed integrated resampling approach is comprehensively examined using several imbalanced web domain data sets with different imbalance ratios. Compared to five other well-known resampling approaches, experimental results confirm that the proposed approach is highly effective. Practical implications This study not only inspires the practical use of online credibility and performance data for identifying malicious web domains but also provides an effective resampling approach for handling the class imbalance issue in the area of malicious web domain identification. Originality/value Online credibility and performance data are applied to build malicious web domain identification models using machine learning techniques. An integrated resampling approach is proposed to address the class imbalance issue. The performance of the proposed approach is confirmed based on real-world data sets with different imbalance ratios.

Download Full-text