scholarly journals A Skyline-Based Decision Boundary Estimation Method for Binominal Classification in Big Data

Computation ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 80
Author(s):  
Christos Kalyvas ◽  
Manolis Maragoudakis

One of the most common tasks nowadays in big data environments is the need to classify large amounts of data. There are numerous classification models designed to perform best in different environments and datasets, each with its advantages and disadvantages. However, when dealing with big data, their performance is significantly degraded because they are not designed—or even capable—of handling very large datasets. The current approach is based on a novel proposal of exploiting the dynamics of skyline queries to efficiently identify the decision boundary and classify big data. A comparison against the popular k-nearest neighbor (k-NN), support vector machines (SVM) and naïve Bayes classification algorithms shows that the proposed method is faster than the k-NN and the SVM. The novelty of this method is based on the fact that only a small number of computations are needed in order to make a prediction, while its full potential is revealed in very large datasets.

Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because the businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety and velocity, data mining techniques and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.


Author(s):  
Hoda Ahmed Abdelhafez

Mining big data is getting a lot of attention currently because businesses need more complex information in order to increase their revenue and gain competitive advantage. Therefore, mining the huge amount of data as well as mining real-time data needs to be done by new data mining techniques/approaches. This chapter will discuss big data volume, variety, and velocity, data mining techniques, and open source tools for handling very large datasets. Moreover, the chapter will focus on two industrial areas telecommunications and healthcare and lessons learned from them.


2020 ◽  
Vol 29 (03n04) ◽  
pp. 2060011
Author(s):  
Emna Hachicha Belghith ◽  
François Rioult ◽  
Medjber Bouzidi

During the last years, big data has become the new emerging trend that increasingly attracting the attention of the R&D community in several fields (e.g., image processing, database engineering, data mining, artificial intelligence). Marine data is part of these fields which accommodates this growth, hence the appearance of marine big data paradigm that monitoring advocates the assessment of human impact on marine data. Nonetheless, supporting acoustic sounds classification is missing in such environment, with taking into account the diversity of such data (i.e., sounds of living undersea species, sounds of human activities, and sounds of environmental effects). To overcome this issue, we propose in this paper an approach that efficiently allowing acoustic diversity classification using machine learning techniques. The aim is to reach an automated support of marine big data analysis. We have conducted a set of experiments, using a real marine dataset, in order to validate our approach and show its effectiveness and efficiency. To do so, three machine learning techniques are employed: (i) classic machine learning models (i.e., k-nearest neighbor and support vector machine), (ii) deep learning based on convolutional neural networks, and (iii) transfer learning based on the reuse of pretrained models.


Author(s):  
Khyati Ahlawat ◽  
Anuradha Chug ◽  
Amit Prakash Singh

The uneven distribution of classes in any dataset poses a tendency of biasness toward the majority class when analyzed using any standard classifier. The instances of the significant class being deficient in numbers are generally ignored and their correct classification which is of paramount interest is often overlooked in calculating overall accuracy. Therefore, the conventional machine learning approaches are rigorously refined to address this class imbalance problem. This challenge of imbalanced classes is more prevalent in big data scenario due to its high volume. This study deals with acknowledging a sampling solution based on cluster computing in handling class imbalance problems in the case of big data. The newly proposed approach hybrid sampling algorithm (HSA) is assessed using three popular classification algorithms namely, support vector machine, decision tree and k-nearest neighbor based on balanced accuracy and elapsed time. The results obtained from the experiment are considered promising with an efficiency gain of 42% in comparison to the traditional sampling solution synthetic minority oversampling technique (SMOTE). This work proves the effectiveness of the distribution and clustering principle in imbalanced big data scenarios.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Ali Labriji ◽  
Abdelkrim Bennar ◽  
Mostafa Rachik

The use of conditional probabilities has gained in popularity in various fields such as medicine, finance, and imaging processing. This has occurred especially with the availability of large datasets that allow us to extract the full potential of the available estimation algorithms. Nevertheless, such a large volume of data is often accompanied by a significant need for computational capacity as well as a consequent compilation time. In this article, we propose a low-cost estimation method: we first demonstrate analytically the convergence of our method to the desired probability and then we perform a simulation to support our point.


2021 ◽  
Author(s):  
Li Guochao ◽  
Zhigang Liu ◽  
Jie Lu ◽  
Honggen Zhou ◽  
Li Sun

Abstract Groove is a key structure of high-performance integral cutting tools. It has to be manufactured by 5-axis grinding machine due to its complex spatial geometry and hard materials. The crucial manufacturing parameters (CMP) are grinding wheel positions and geometries. However, it is a challenging problem to solve the CMP for the designed groove. The traditional trial-and-error or analytical methods have defects such as time-consuming, limited-applying and low accuracy. In this study, the problem is translated into a multiple output regression model of groove manufacture (MORGM) based on the big data technology and AI algorithms. The input are 34 groove geometry features and the output are 5 CMP. Firstly, two groove machining big data sets with different range are established, each of which is includes 46656 records. They are used as data resource for MORGM. Secondly, 7 AI algorithms, including linear regression, k nearest-neighbor regression, decision trees, random forest regression, support vector regression and ANN algorithms are discussed to build the model. Then, 28 experiments are carried out to test the big data set and algorithms. Finally, the best MORGM is built by ANN algorithm and the big data set with a larger range. The results show that CMP can be calculated accurately and conveniently by the built MORGM.


Sign in / Sign up

Export Citation Format

Share Document