Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic

Author(s):  
Bhavani Thuraisingham
Author(s):  
Man Tianxing ◽  
Nataly Zhukova ◽  
Alexander Vodyaho ◽  
Tin Tun Aung

Extracting knowledge from data streams received from observed objects through data mining is required in various domains. However, there is a lack of any kind of guidance on which techniques can or should be used in which contexts. Meta mining technology can help build processes of data processing based on knowledge models taking into account the specific features of the objects. This paper proposes a meta mining ontology framework that allows selecting algorithms for solving specific data mining tasks and build suitable processes. The proposed ontology is constructed using existing ontologies and is extended with an ontology of data characteristics and task requirements. Different from the existing ontologies, the proposed ontology describes the overall data mining process, used to build data processing processes in various domains, and has low computational complexity compared to others. The authors developed an ontology merging method and a sub-ontology extraction method, which are implemented based on OWL API via extracting and integrating the relevant axioms.


Author(s):  
Florent Masseglia ◽  
Pascal Poncelet ◽  
Maguelonne Teisseire

With the huge number of information sources available on the Internet and the high dynamics of their data, peer-to-peer (P2P) systems propose a communication model in which each party has the same capabilities and can initiate a communication session. These networks allow a group of computer users with the same networking program to connect with each other and directly access resources from one another. P2P architectures also provide a good infrastructure for data and computer intensive operations such as data mining. In this article we consider a new data mining approach for improving resource searching in a dynamic and distributed database such as an unstructured P2P system, that is, in Masseglia, Poncelet, and Teisseire (2006) we call this problem P2P usage analysis. More precisely we aim at discovering frequent behaviors among users of such a system. We will focus on the sequential order between actions performed on each node (requests or downloads) and show how this order has to be taken into account for extracting useful knowledge. For instance, it may be discovered, in a P2P file sharing network that for 77% of nodes from which a request is sent for “Mandriva Linux,” the file “Mandriva Linux 2005 CD1 i585-Limited- Edition-Mini.iso” is chosen and downloaded; then a new request is performed with the possible name of the remaining iso images (i.e., “Mandriva Linux 2005 Limited Edition”), and in the large number of returned results the image corresponding to “Mandriva Linux 2005 CD2 i585-Limited-Edition-Mini.iso” is chosen and downloaded. Such knowledge is very useful for proposing the user with often downloaded or requested files according to a majority of behaviors. It could also be useful in order to avoid extra bandwidth consumption, which is the main cost of P2P queries (Ng, Chu, Rao, Sripanidkulchai, & Zhang, 2003).


2008 ◽  
pp. 693-704
Author(s):  
Bhavani Thuraisingham

This article first describes the privacy concerns that arise due to data mining, especially for national security applications. Then we discuss privacy-preserving data mining. In particular, we view the privacy problem as a form of inference problem and introduce the notion of privacy constraints. We also describe an approach for privacy constraint processing and discuss its relationship to privacy-preserving data mining. Then we give an overview of the developments on privacy-preserving data mining that attempt to maintain privacy and at the same time extract useful information from data mining. Finally, some directions for future research on privacy as related to data mining are given.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Grigore Stamatescu ◽  
Iulia Stamatescu ◽  
Nicoleta Arghira ◽  
Ioana Fagarasan

Considering the advances in building monitoring and control through networks of interconnected devices, effective handling of the associated rich data streams is becoming an important challenge. In many situations, the application of conventional system identification or approximate grey-box models, partly theoretic and partly data driven, is either unfeasible or unsuitable. The paper discusses and illustrates an application of black-box modelling achieved using data mining techniques with the purpose of smart building ventilation subsystem control. We present the implementation and evaluation of a data mining methodology on collected data from over one year of operation. The case study is carried out on four air handling units of a modern campus building for preliminary decision support for facility managers. The data processing and learning framework is based on two steps: raw data streams are compressed using the Symbolic Aggregate Approximation method, followed by the resulting segments being input into a Support Vector Machine algorithm. The results are useful for deriving the behaviour of each equipment in various modi of operation and can be built upon for fault detection or energy efficiency applications. Challenges related to online operation within a commercial Building Management System are also discussed as the approach shows promise for deployment.


Sign in / Sign up

Export Citation Format

Share Document