scholarly journals Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning Algorithm

Complexity ◽  
2019 ◽  
Vol 2019 ◽  
pp. 1-16 ◽  
Author(s):  
Noemí DeCastro-García ◽  
Ángel Luis Muñoz Castañeda ◽  
David Escudero García ◽  
Miguel V. Carriegos

Selecting the best configuration of hyperparameter values for a Machine Learning model yields directly in the performance of the model on the dataset. It is a laborious task that usually requires deep knowledge of the hyperparameter optimizations methods and the Machine Learning algorithms. Although there exist several automatic optimization techniques, these usually take significant resources, increasing the dynamic complexity in order to obtain a great accuracy. Since one of the most critical aspects in this computational consume is the available dataset, among others, in this paper we perform a study of the effect of using different partitions of a dataset in the hyperparameter optimization phase over the efficiency of a Machine Learning algorithm. Nonparametric inference has been used to measure the rate of different behaviors of the accuracy, time, and spatial complexity that are obtained among the partitions and the whole dataset. Also, a level of gain is assigned to each partition allowing us to study patterns and allocate whose samples are more profitable. Since Cybersecurity is a discipline in which the efficiency of Artificial Intelligence techniques is a key aspect in order to extract actionable knowledge, the statistical analyses have been carried out over five Cybersecurity datasets.

2018 ◽  
Vol 7 (4.15) ◽  
pp. 400 ◽  
Author(s):  
Thuy Nguyen Thi Thu ◽  
Vuong Dang Xuan

The exchange rate of each money pair can be predicted by using machine learning algorithm during classification process. With the help of supervised machine learning model, the predicted uptrend or downtrend of FoRex rate might help traders to have right decision on FoRex transactions. The installation of machine learning algorithms in the FoRex trading online market can automatically make the transactions of buying/selling. All the transactions in the experiment are performed by using scripts added-on in transaction application. The capital, profits results of use support vector machine (SVM) models are higher than the normal one (without use of SVM). 


Author(s):  
A. Khanwalkar ◽  
R. Soni

Purpose: Diabetes is a chronic disease that pays for a large proportion of the nation's healthcare expenses when people with diabetes want medical care continuously. Several complications will occur if the polymer disorder is not treated and unrecognizable. The prescribed condition leads to a diagnostic center and a doctor's intention. One of the real-world subjects essential is to find the first phase of the polytechnic. In this work, basically a survey that has been analyzed in several parameters within the poly-infected disorder diagnosis. It resembles the classification algorithms of data collection that plays an important role in the data collection method. Automation of polygenic disorder analysis, as well as another machine learning algorithm. Design/methodology/approach: This paper provides extensive surveys of different analogies which have been used for the analysis of medical data, For the purpose of early detection of polygenic disorder. This paper takes into consideration methods such as J48, CART, SVMs and KNN square, this paper also conducts a formal surveying of all the studies, and provides a conclusion at the end. Findings: This surveying has been analyzed on several parameters within the poly-infected disorder diagnosis. It resembles that the classification algorithms of data collection plays an important role in the data collection method in Automation of polygenic disorder analysis, as well as another machine learning algorithm. Practical implications: This paper will help future researchers in the field of Healthcare, specifically in the domain of diabetes, to understand differences between classification algorithms. Originality/value: This paper will help in comparing machine learning algorithms by going through results and selecting the appropriate approach based on requirements.


2021 ◽  
Author(s):  
Aria Abubakar ◽  
Mandar Kulkarni ◽  
Anisha Kaul

Abstract In the process of deriving the reservoir petrophysical properties of a basin, identifying the pay capability of wells by interpreting various geological formations is key. Currently, this process is facilitated and preceded by well log correlation, which involves petrophysicists and geologists examining multiple raw log measurements for the well in question, indicating geological markers of formation changes and correlating them with those of neighboring wells. As it may seem, this activity of picking markers of a well is performed manually and the process of ‘examining’ may be highly subjective, thus, prone to inconsistencies. In our work, we propose to automate the well correlation workflow by using a Soft- Attention Convolutional Neural Network to predict well markers. The machine learning algorithm is supervised by examples of manual marker picks and their corresponding occurrence in logs such as gamma-ray, resistivity and density. Our experiments have shown that, specifically, the attention mechanism allows the Convolutional Neural Network to look at relevant features or patterns in the log measurements that suggest a change in formation, making the machine learning model highly precise.


The aim of this research is to do risk modelling after analysis of twitter posts based on certain sentiment analysis. In this research we analyze posts of several users or a particular user to check whether they can be cause of concern to the society or not. Every sentiment like happy, sad, anger and other emotions are going to provide scaling of severity in the conclusion of final table on which machine learning algorithm is applied. The data which is put under the machine learning algorithms are been monitored over a period of time and it is related to a particular topic in an area


2020 ◽  
Vol 7 (10) ◽  
pp. 380-389
Author(s):  
Asogwa D.C ◽  
Anigbogu S.O ◽  
Anigbogu G.N ◽  
Efozia F.N

Author's age prediction is the task of determining the author's age by studying the texts written by them. The prediction of author’s age can be enlightening about the different trends, opinions social and political views of an age group. Marketers always use this to encourage a product or a service to an age group following their conveyed interests and opinions. Methodologies in natural language processing have made it possible to predict author’s age from text by examining the variation of linguistic characteristics. Also, many machine learning algorithms have been used in author’s age prediction. However, in social networks, computational linguists are challenged with numerous issues just as machine learning techniques are performance driven with its own challenges in realistic scenarios. This work developed a model that can predict author's age from text with a machine learning algorithm (Naïve Bayes) using three types of features namely, content based, style based and topic based. The trained model gave a prediction accuracy of 80%.


Author(s):  
Virendra Tiwari ◽  
Balendra Garg ◽  
Uday Prakash Sharma

The machine learning algorithms are capable of managing multi-dimensional data under the dynamic environment. Despite its so many vital features, there are some challenges to overcome. The machine learning algorithms still requires some additional mechanisms or procedures for predicting a large number of new classes with managing privacy. The deficiencies show the reliable use of a machine learning algorithm relies on human experts because raw data may complicate the learning process which may generate inaccurate results. So the interpretation of outcomes with expertise in machine learning mechanisms is a significant challenge in the machine learning algorithm. The machine learning technique suffers from the issue of high dimensionality, adaptability, distributed computing, scalability, the streaming data, and the duplicity. The main issue of the machine learning algorithm is found its vulnerability to manage errors. Furthermore, machine learning techniques are also found to lack variability. This paper studies how can be reduced the computational complexity of machine learning algorithms by finding how to make predictions using an improved algorithm.


Author(s):  
Ladly Patel ◽  
Kumar Abhishek Gaurav

In today's world, a huge amount of data is available. So, all the available data are analyzed to get information, and later this data is used to train the machine learning algorithm. Machine learning is a subpart of artificial intelligence where machines are given training with data and the machine predicts the results. Machine learning is being used in healthcare, image processing, marketing, etc. The aim of machine learning is to reduce the work of the programmer by doing complex coding and decreasing human interaction with systems. The machine learns itself from past data and then predict the desired output. This chapter describes machine learning in brief with different machine learning algorithms with examples and about machine learning frameworks such as tensor flow and Keras. The limitations of machine learning and various applications of machine learning are discussed. This chapter also describes how to identify features in machine learning data.


2020 ◽  
Vol 48 (7) ◽  
pp. 030006052093688
Author(s):  
Daehyuk Yim ◽  
Tae Young Yeo ◽  
Moon Ho Park

Objective To develop a machine learning algorithm to identify cognitive dysfunction based on neuropsychological screening test results. Methods This retrospective study included 955 participants: 341 participants with dementia (dementia), 333 participants with mild cognitive impairment (MCI), and 341 participants who were cognitively healthy. All participants underwent evaluations including the Mini-Mental State Examination and the Montreal Cognitive Assessment. Each participant’s caregiver or informant was surveyed using the Korean Dementia Screening Questionnaire at the same visit. Different machine learning algorithms were applied, and their overall accuracies, Cohen’s kappa, receiver operating characteristic curves, and areas under the curve (AUCs) were calculated. Results The overall screening accuracies for MCI, dementia, and cognitive dysfunction (MCI or dementia) using a machine learning algorithm were approximately 67.8% to 93.5%, 96.8% to 99.9%, and 75.8% to 99.9%, respectively. Their kappa statistics ranged from 0.351 to 1.000. The AUCs of the machine learning models were statistically superior to those of the competing screening model. Conclusion This study suggests that a machine learning algorithm can be used as a supportive tool in the screening of MCI, dementia, and cognitive dysfunction.


Author(s):  
Petr Berka ◽  
Ivan Bruha

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Peter Appiahene ◽  
Yaw Marfo Missah ◽  
Ussiph Najim

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.


Sign in / Sign up

Export Citation Format

Share Document