Data mining for security applications: Mining concept-drifting data streams to detect peer to peer botnet traffic

Extracting knowledge from data streams received from observed objects through data mining is required in various domains. However, there is a lack of any kind of guidance on which techniques can or should be used in which contexts. Meta mining technology can help build processes of data processing based on knowledge models taking into account the specific features of the objects. This paper proposes a meta mining ontology framework that allows selecting algorithms for solving specific data mining tasks and build suitable processes. The proposed ontology is constructed using existing ontologies and is extended with an ontology of data characteristics and task requirements. Different from the existing ontologies, the proposed ontology describes the overall data mining process, used to build data processing processes in various domains, and has low computational complexity compared to others. The authors developed an ontology merging method and a sub-ontology extraction method, which are implemented based on OWL API via extracting and integrating the relevant axioms.

Download Full-text

Peer-to-Peer Usage Analysis

Encyclopedia of Multimedia Technology and Networking, Second Edition ◽

10.4018/978-1-60566-014-1.ch154 ◽

2009 ◽

pp. 1136-1141

Author(s):

Florent Masseglia ◽

Pascal Poncelet ◽

Maguelonne Teisseire

Keyword(s):

Data Mining ◽

Distributed Database ◽

Peer To Peer ◽

Communication Model ◽

P2p Systems ◽

P2p File Sharing ◽

Limited Edition ◽

Usage Analysis ◽

High Dynamics ◽

P2p System

With the huge number of information sources available on the Internet and the high dynamics of their data, peer-to-peer (P2P) systems propose a communication model in which each party has the same capabilities and can initiate a communication session. These networks allow a group of computer users with the same networking program to connect with each other and directly access resources from one another. P2P architectures also provide a good infrastructure for data and computer intensive operations such as data mining. In this article we consider a new data mining approach for improving resource searching in a dynamic and distributed database such as an unstructured P2P system, that is, in Masseglia, Poncelet, and Teisseire (2006) we call this problem P2P usage analysis. More precisely we aim at discovering frequent behaviors among users of such a system. We will focus on the sequential order between actions performed on each node (requests or downloads) and show how this order has to be taken into account for extracting useful knowledge. For instance, it may be discovered, in a P2P file sharing network that for 77% of nodes from which a request is sent for “Mandriva Linux,” the file “Mandriva Linux 2005 CD1 i585-Limited- Edition-Mini.iso” is chosen and downloaded; then a new request is performed with the possible name of the remaining iso images (i.e., “Mandriva Linux 2005 Limited Edition”), and in the large number of returned results the image corresponding to “Mandriva Linux 2005 CD2 i585-Limited-Edition-Mini.iso” is chosen and downloaded. Such knowledge is very useful for proposing the user with often downloaded or requested files according to a majority of behaviors. It could also be useful in order to avoid extra bandwidth consumption, which is the main cost of P2P queries (Ng, Chu, Rao, Sripanidkulchai, & Zhang, 2003).

Download Full-text

Privacy-Preserving Data Mining

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch044 ◽

2008 ◽

pp. 693-704

Author(s):

Bhavani Thuraisingham

Keyword(s):

Data Mining ◽

Privacy Preserving ◽

Future Research ◽

Inference Problem ◽

Privacy Preserving Data Mining ◽

Privacy Concerns ◽

Security Applications ◽

Constraint Processing ◽

Privacy Constraints ◽

Privacy Problem

This article first describes the privacy concerns that arise due to data mining, especially for national security applications. Then we discuss privacy-preserving data mining. In particular, we view the privacy problem as a form of inference problem and introduce the notion of privacy constraints. We also describe an approach for privacy constraint processing and discuss its relationship to privacy-preserving data mining. Then we give an overview of the developments on privacy-preserving data mining that attempt to maintain privacy and at the same time extract useful information from data mining. Finally, some directions for future research on privacy as related to data mining are given.

Download Full-text

Data-Driven Modelling of Smart Building Ventilation Subsystem

Journal of Sensors ◽

10.1155/2019/3572019 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Grigore Stamatescu ◽

Iulia Stamatescu ◽

Nicoleta Arghira ◽

Ioana Fagarasan

Keyword(s):

Data Mining ◽

Data Streams ◽

Data Driven ◽

Support Vector ◽

Commercial Building ◽

Monitoring And Control ◽

Smart Building ◽

Building Ventilation ◽

Using Data ◽

Rich Data

Considering the advances in building monitoring and control through networks of interconnected devices, effective handling of the associated rich data streams is becoming an important challenge. In many situations, the application of conventional system identification or approximate grey-box models, partly theoretic and partly data driven, is either unfeasible or unsuitable. The paper discusses and illustrates an application of black-box modelling achieved using data mining techniques with the purpose of smart building ventilation subsystem control. We present the implementation and evaluation of a data mining methodology on collected data from over one year of operation. The case study is carried out on four air handling units of a modern campus building for preliminary decision support for facility managers. The data processing and learning framework is based on two steps: raw data streams are compressed using the Symbolic Aggregate Approximation method, followed by the resulting segments being input into a Support Vector Machine algorithm. The results are useful for deriving the behaviour of each equipment in various modi of operation and can be built upon for fault detection or energy efficiency applications. Challenges related to online operation within a commercial Building Management System are also discussed as the approach shows promise for deployment.

Download Full-text