scholarly journals Cost-Sensitive Distributed Machine Learning for NetFlow-Based Botnet Activity Detection

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Rafał Kozik ◽  
Marek Pawlicki ◽  
Michał Choraś

The recent advancements of malevolent techniques have caused a situation where the traditional signature-based approach to cyberattack detection is rendered ineffective. Currently, new, improved, potent solutions incorporating Big Data technologies, effective distributed machine learning, and algorithms countering data imbalance problem are needed. Therefore, the major contribution of this paper is the proposal of the cost-sensitive distributed machine learning approach for cybersecurity. In particular, we proposed to use and implemented cost-sensitive distributed machine learning by means of distributed Extreme Learning Machines (ELM), distributed Random Forest, and Distributed Random Boosted-Trees to detect botnets. The system’s concept and architecture are based on the Big Data processing framework with data mining and machine learning techniques. In practical terms in this paper, as a use case, we consider the problem of botnet detection by means of analysing the data in form of NetFlows. The reported results are promising and show that the proposed system can be considered as a useful tool for the improvement of cybersecurity.

Author(s):  
Padmavathi .S ◽  
M. Chidambaram

Text classification has grown into more significant in managing and organizing the text data due to tremendous growth of online information. It does classification of documents in to fixed number of predefined categories. Rule based approach and Machine learning approach are the two ways of text classification. In rule based approach, classification of documents is done based on manually defined rules. In Machine learning based approach, classification rules or classifier are defined automatically using example documents. It has higher recall and quick process. This paper shows an investigation on text classification utilizing different machine learning techniques.


Author(s):  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Enrique S Quintana-Ortí

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.


2021 ◽  
Vol 119 ◽  
pp. 44-53
Author(s):  
Danilo Bertoni ◽  
Giacomo Aletti ◽  
Daniele Cavicchioli ◽  
Alessandra Micheletti ◽  
Roberto Pretolani

Author(s):  
Bruce Mellado ◽  
Jianhong Wu ◽  
Jude Dzevela Kong ◽  
Nicola Luigi Bragazzi ◽  
Ali Asgary ◽  
...  

COVID-19 is imposing massive health, social and economic costs. While many developed countries have started vaccinating, most African nations are waiting for vaccine stocks to be allocated and are using clinical public health (CPH) strategies to control the pandemic. The emergence of variants of concern (VOC), unequal access to the vaccine supply and locally specific logistical and vaccine delivery parameters, add complexity to national CPH strategies and amplify the urgent need for effective CPH policies. Big data and artificial intelligence machine learning techniques and collaborations can be instrumental in an accurate, timely, locally nuanced analysis of multiple data sources to inform CPH decision-making, vaccination strategies and their staged roll-out. The Africa-Canada Artificial Intelligence and Data Innovation Consortium (ACADIC) has been established to develop and employ machine learning techniques to design CPH strategies in Africa, which requires ongoing collaboration, testing and development to maximize the equity and effectiveness of COVID-19-related CPH interventions.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tahani Daghistani ◽  
Huda AlGhamdi ◽  
Riyad Alshammari ◽  
Raed H. AlHazme

AbstractOutpatients who fail to attend their appointments have a negative impact on the healthcare outcome. Thus, healthcare organizations facing new opportunities, one of them is to improve the quality of healthcare. The main challenges is predictive analysis using techniques capable of handle the huge data generated. We propose a big data framework for identifying subject outpatients’ no-show via feature engineering and machine learning (MLlib) in the Spark platform. This study evaluates the performance of five machine learning techniques, using the (2,011,813‬) outpatients’ visits data. Conducting several experiments and using different validation methods, the Gradient Boosting (GB) performed best, resulting in an increase of accuracy and ROC to 79% and 81%, respectively. In addition, we showed that exploring and evaluating the performance of the machine learning models using various evaluation methods is critical as the accuracy of prediction can significantly differ. The aim of this paper is exploring factors that affect no-show rate and can be used to formulate predictions using big data machine learning techniques.


Author(s):  
Gediminas Adomavicius ◽  
Yaqiong Wang

Numerical predictive modeling is widely used in different application domains. Although many modeling techniques have been proposed, and a number of different aggregate accuracy metrics exist for evaluating the overall performance of predictive models, other important aspects, such as the reliability (or confidence and uncertainty) of individual predictions, have been underexplored. We propose to use estimated absolute prediction error as the indicator of individual prediction reliability, which has the benefits of being intuitive and providing highly interpretable information to decision makers, as well as allowing for more precise evaluation of reliability estimation quality. As importantly, the proposed reliability indicator allows the reframing of reliability estimation itself as a canonical numeric prediction problem, which makes the proposed approach general-purpose (i.e., it can work in conjunction with any outcome prediction model), alleviates the need for distributional assumptions, and enables the use of advanced, state-of-the-art machine learning techniques to learn individual prediction reliability patterns directly from data. Extensive experimental results on multiple real-world data sets show that the proposed machine learning-based approach can significantly improve individual prediction reliability estimation as compared with a number of baselines from prior work, especially in more complex predictive scenarios.


Polymers ◽  
2021 ◽  
Vol 13 (18) ◽  
pp. 3100
Author(s):  
Anusha Mairpady ◽  
Abdel-Hamid I. Mourad ◽  
Mohammad Sayem Mozumder

The selection of nanofillers and compatibilizing agents, and their size and concentration, are always considered to be crucial in the design of durable nanobiocomposites with maximized mechanical properties (i.e., fracture strength (FS), yield strength (YS), Young’s modulus (YM), etc). Therefore, the statistical optimization of the key design factors has become extremely important to minimize the experimental runs and the cost involved. In this study, both statistical (i.e., analysis of variance (ANOVA) and response surface methodology (RSM)) and machine learning techniques (i.e., artificial intelligence-based techniques (i.e., artificial neural network (ANN) and genetic algorithm (GA)) were used to optimize the concentrations of nanofillers and compatibilizing agents of the injection-molded HDPE nanocomposites. Initially, through ANOVA, the concentrations of TiO2 and cellulose nanocrystals (CNCs) and their combinations were found to be the major factors in improving the durability of the HDPE nanocomposites. Further, the data were modeled and predicted using RSM, ANN, and their combination with a genetic algorithm (i.e., RSM-GA and ANN-GA). Later, to minimize the risk of local optimization, an ANN-GA hybrid technique was implemented in this study to optimize multiple responses, to develop the nonlinear relationship between the factors (i.e., the concentration of TiO2 and CNCs) and responses (i.e., FS, YS, and YM), with minimum error and with regression values above 95%.


Sign in / Sign up

Export Citation Format

Share Document