Cost-Sensitive Distributed Machine Learning for NetFlow-Based Botnet Activity Detection

Security and Communication Networks ◽

10.1155/2018/8753870 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Rafał Kozik ◽

Marek Pawlicki ◽

Michał Choraś

Keyword(s):

Machine Learning ◽

Big Data ◽

Machine Learning Techniques ◽

Learning Machines ◽

Imbalance Problem ◽

Learning Techniques ◽

Machine Learning Approach ◽

Big Data Technologies ◽

The Cost ◽

Distributed Machine Learning

The recent advancements of malevolent techniques have caused a situation where the traditional signature-based approach to cyberattack detection is rendered ineffective. Currently, new, improved, potent solutions incorporating Big Data technologies, effective distributed machine learning, and algorithms countering data imbalance problem are needed. Therefore, the major contribution of this paper is the proposal of the cost-sensitive distributed machine learning approach for cybersecurity. In particular, we proposed to use and implemented cost-sensitive distributed machine learning by means of distributed Extreme Learning Machines (ELM), distributed Random Forest, and Distributed Random Boosted-Trees to detect botnets. The system’s concept and architecture are based on the Big Data processing framework with data mining and machine learning techniques. In practical terms in this paper, as a use case, we consider the problem of botnet detection by means of analysing the data in form of NetFlows. The reported results are promising and show that the proposed system can be considered as a useful tool for the improvement of cybersecurity.

Download Full-text

A Brief Survey on Text Classification Using Various Machine Learning Techniques

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v8i1.521 ◽

2018 ◽

Vol 8 (1) ◽

pp. 14

Author(s):

Padmavathi .S ◽

M. Chidambaram

Keyword(s):

Machine Learning ◽

Text Classification ◽

Fixed Number ◽

Machine Learning Techniques ◽

Online Information ◽

Rule Based ◽

Learning Techniques ◽

Machine Learning Approach ◽

Rule Based Approach

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.

Download Full-text

Adaptive internet of things and machine learning techniques for managing the complexity of intelligent systems big data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189844 ◽

2021 ◽

pp. 1-1

Author(s):

Ahmed A. Elngar

Keyword(s):

Machine Learning ◽

Big Data ◽

Internet Of Things ◽

Intelligent Systems ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

Matrix Vector

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

Advanced big-data/machine-learning techniques for optimization and performance enhancement of the heat pipe technology – A review and prospective study

Applied Energy ◽

10.1016/j.apenergy.2021.116969 ◽

2021 ◽

Vol 294 ◽

pp. 116969

Author(s):

Zhangyuan Wang ◽

Xudong Zhao ◽

Zhonghe Han ◽

Liang Luo ◽

Jinwei Xiang ◽

...

Keyword(s):

Machine Learning ◽

Big Data ◽

Prospective Study ◽

Heat Pipe ◽

Performance Enhancement ◽

Machine Learning Techniques ◽

Learning Techniques ◽

And Performance ◽

Optimization And Performance

Download Full-text

Estimating the CAP greening effect by machine learning techniques: A big data ex post analysis

Environmental Science & Policy ◽

10.1016/j.envsci.2021.01.008 ◽

2021 ◽

Vol 119 ◽

pp. 44-53

Author(s):

Danilo Bertoni ◽

Giacomo Aletti ◽

Daniele Cavicchioli ◽

Alessandra Micheletti ◽

Roberto Pretolani

Keyword(s):

Machine Learning ◽

Big Data ◽

Machine Learning Techniques ◽

Ex Post ◽

Learning Techniques ◽

Ex Post Analysis

Download Full-text

Leveraging Artificial Intelligence and Big Data to Optimize COVID-19 Clinical Public Health and Vaccination Roll-Out Strategies in Africa

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18157890 ◽

2021 ◽

Vol 18 (15) ◽

pp. 7890

Author(s):

Bruce Mellado ◽

Jianhong Wu ◽

Jude Dzevela Kong ◽

Nicola Luigi Bragazzi ◽

Ali Asgary ◽

...

Keyword(s):

Public Health ◽

Artificial Intelligence ◽

Machine Learning ◽

Big Data ◽

Developed Countries ◽

Machine Learning Techniques ◽

Multiple Data ◽

Learning Techniques ◽

Unequal Access ◽

Roll Out

COVID-19 is imposing massive health, social and economic costs. While many developed countries have started vaccinating, most African nations are waiting for vaccine stocks to be allocated and are using clinical public health (CPH) strategies to control the pandemic. The emergence of variants of concern (VOC), unequal access to the vaccine supply and locally specific logistical and vaccine delivery parameters, add complexity to national CPH strategies and amplify the urgent need for effective CPH policies. Big data and artificial intelligence machine learning techniques and collaborations can be instrumental in an accurate, timely, locally nuanced analysis of multiple data sources to inform CPH decision-making, vaccination strategies and their staged roll-out. The Africa-Canada Artificial Intelligence and Data Innovation Consortium (ACADIC) has been established to develop and employ machine learning techniques to design CPH strategies in Africa, which requires ongoing collaboration, testing and development to maximize the equity and effectiveness of COVID-19-related CPH interventions.

Download Full-text

Predictors of outpatients’ no-show: big data analytics using apache spark

Journal Of Big Data ◽

10.1186/s40537-020-00384-9 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tahani Daghistani ◽

Huda AlGhamdi ◽

Riyad Alshammari ◽

Raed H. AlHazme

Keyword(s):

Machine Learning ◽

Big Data ◽

Negative Impact ◽

Big Data Analytics ◽

Quality Of Healthcare ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Healthcare Organizations ◽

Data Framework ◽

Learning Techniques

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.

Download Full-text

Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

INFORMS Journal on Computing ◽

10.1287/ijoc.2020.1019 ◽

2021 ◽

Author(s):

Gediminas Adomavicius ◽

Yaqiong Wang

Keyword(s):

Machine Learning ◽

General Purpose ◽

Reliability Estimation ◽

Machine Learning Techniques ◽

Data Sets ◽

Real World Data ◽

Learning Techniques ◽

Reliability Indicator ◽

Machine Learning Approach ◽

Prediction Reliability

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.

Download Full-text

Statistical and Machine Learning-Driven Optimization of Mechanical Properties in Designing Durable HDPE Nanobiocomposites

Polymers ◽

10.3390/polym13183100 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3100

Author(s):

Anusha Mairpady ◽

Abdel-Hamid I. Mourad ◽

Mohammad Sayem Mozumder

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Genetic Algorithm ◽

Machine Learning Techniques ◽

Hybrid Technique ◽

Learning Techniques ◽

Artificial Neural Network Ann ◽

Major Factors ◽

The Cost ◽

Design Factors

The selection of nanofillers and compatibilizing agents, and their size and concentration, are always considered to be crucial in the design of durable nanobiocomposites with maximized mechanical properties (i.e., fracture strength (FS), yield strength (YS), Young’s modulus (YM), etc). Therefore, the statistical optimization of the key design factors has become extremely important to minimize the experimental runs and the cost involved. In this study, both statistical (i.e., analysis of variance (ANOVA) and response surface methodology (RSM)) and machine learning techniques (i.e., artificial intelligence-based techniques (i.e., artificial neural network (ANN) and genetic algorithm (GA)) were used to optimize the concentrations of nanofillers and compatibilizing agents of the injection-molded HDPE nanocomposites. Initially, through ANOVA, the concentrations of TiO2 and cellulose nanocrystals (CNCs) and their combinations were found to be the major factors in improving the durability of the HDPE nanocomposites. Further, the data were modeled and predicted using RSM, ANN, and their combination with a genetic algorithm (i.e., RSM-GA and ANN-GA). Later, to minimize the risk of local optimization, an ANN-GA hybrid technique was implemented in this study to optimize multiple responses, to develop the nonlinear relationship between the factors (i.e., the concentration of TiO2 and CNCs) and responses (i.e., FS, YS, and YM), with minimum error and with regression values above 95%.

Download Full-text

Big Data Analytics Processes in Industrial Internet of Things Systems: Sensing and Computing Technologies, Machine Learning Techniques, and Autonomous Decision-Making Algorithms

Journal of Self-Governance and Management Economics ◽

10.22381/jsme7420194 ◽

2019 ◽

Vol 7 (4) ◽

pp. 28 ◽

Cited By ~ 4

Keyword(s):

Machine Learning ◽

Decision Making ◽

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Machine Learning Techniques ◽

Industrial Internet Of Things ◽

Autonomous Decision ◽

Learning Techniques ◽

Industrial Internet

Download Full-text