Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

Security and Communication Networks ◽

10.1155/2017/1306802 ◽

2017 ◽

Vol 2017 ◽

pp. 1-17 ◽

Cited By ~ 2

Author(s):

Vitor Hugo Galhardo Moia ◽

Marco Aurélio Amaral Henriques

Keyword(s):

Electronic Devices ◽

Detailed Comparison ◽

Data Sets ◽

Brute Force ◽

Future Directions ◽

Approximate Matching ◽

New Methods ◽

Huge Data ◽

New Strategy ◽

Search Precision

Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. There are several ways to perform this search. Known File Filter (KFF) is one of them, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device. However, due to limitations over hash functions (inability to detect similar objects), new methods have been designed, called approximate matching. This sort of function has interesting characteristics for KFF investigations but suffers mainly from high costs when dealing with huge data sets, as the search is usually done by brute force. To mitigate this problem, strategies have been developed to better perform lookups. In this paper, we present the state of the art of similarity digest search strategies, along with a detailed comparison involving several aspects, as time complexity, memory requirement, and search precision. Our results show that none of the approaches address at least these main aspects. Finally, we discuss future directions and present requirements for a new strategy aiming to fulfill current limitations.

Download Full-text

Green Synthesized Nanomaterials as Theranostic Platforms for Cancer Treatment: Principles, Challenges and the Road Ahead

Current Medicinal Chemistry ◽

10.2174/0929867324666170309124327 ◽

2019 ◽

Vol 26 (8) ◽

pp. 1311-1327 ◽

Cited By ~ 6

Author(s):

Pala Rajasekharreddy ◽

Chao Huang ◽

Siddhardha Busi ◽

Jobina Rajkumari ◽

Ming-Hong Tai ◽

...

Keyword(s):

Nanoparticle Synthesis ◽

Research Field ◽

Synthesis Methods ◽

Synthetic Approach ◽

Future Directions ◽

The Road ◽

New Methods ◽

Recent Developments ◽

Theranostic Applications ◽

Insight Into

With the emergence of nanotechnology, new methods have been developed for engineering various nanoparticles for biomedical applications. Nanotheranostics is a burgeoning research field with tremendous prospects for the improvement of diagnosis and treatment of various cancers. However, the development of biocompatible and efficient drug/gene delivery theranostic systems still remains a challenge. Green synthetic approach of nanoparticles with low capital and operating expenses, reduced environmental pollution and better biocompatibility and stability is a latest and novel field, which is advantageous over chemical or physical nanoparticle synthesis methods. In this article, we summarize the recent research progresses related to green synthesized nanoparticles for cancer theranostic applications, and we also conclude with a look at the current challenges and insight into the future directions based on recent developments in these areas.

Download Full-text

Consistent satellite XCO<sub>2</sub> retrievals from SCIAMACHY and GOSAT using the BESD algorithm

Atmospheric Measurement Techniques Discussions ◽

10.5194/amtd-8-1787-2015 ◽

2015 ◽

Vol 8 (2) ◽

pp. 1787-1832 ◽

Cited By ~ 1

Author(s):

J. Heymann ◽

M. Reuter ◽

M. Hilker ◽

M. Buchwitz ◽

O. Schneising ◽

...

Keyword(s):

Carbon Cycle ◽

Detailed Comparison ◽

Mean Difference ◽

Total Carbon ◽

Retrieval Algorithm ◽

Data Sets ◽

Data Set ◽

Time Period ◽

Comparison Results ◽

The Future

Abstract. Consistent and accurate long-term data sets of global atmospheric concentrations of carbon dioxide (CO2) are required for carbon cycle and climate related research. However, global data sets based on satellite observations may suffer from inconsistencies originating from the use of products derived from different satellites as needed to cover a long enough time period. One reason for inconsistencies can be the use of different retrieval algorithms. We address this potential issue by applying the same algorithm, the Bremen Optimal Estimation DOAS (BESD) algorithm, to different satellite instruments, SCIAMACHY onboard ENVISAT (March 2002–April 2012) and TANSO-FTS onboard GOSAT (launched in January 2009), to retrieve XCO2, the column-averaged dry-air mole fraction of CO2. BESD has been initially developed for SCIAMACHY XCO2 retrievals. Here, we present the first detailed assessment of the new GOSAT BESD XCO2 product. GOSAT BESD XCO2 is a product generated and delivered to the MACC project for assimilation into ECMWF's Integrated Forecasting System (IFS). We describe the modifications of the BESD algorithm needed in order to retrieve XCO2 from GOSAT and present detailed comparisons with ground-based observations of XCO2 from the Total Carbon Column Observing Network (TCCON). We discuss detailed comparison results between all three XCO2 data sets (SCIAMACHY, GOSAT and TCCON). The comparison results demonstrate the good consistency between the SCIAMACHY and the GOSAT XCO2. For example, we found a mean difference for daily averages of −0.60 ± 1.56 ppm (mean difference ± standard deviation) for GOSAT-SCIAMACHY (linear correlation coefficient r = 0.82), −0.34 ± 1.37 ppm (r = 0.86) for GOSAT-TCCON and 0.10 ± 1.79 ppm (r = 0.75) for SCIAMACHY-TCCON. The remaining differences between GOSAT and SCIAMACHY are likely due to non-perfect collocation (±2 h, 10° × 10° around TCCON sites), i.e., the observed air masses are not exactly identical, but likely also due to a still non-perfect BESD retrieval algorithm, which will be continuously improved in the future. Our overarching goal is to generate a satellite-derived XCO2 data set appropriate for climate and carbon cycle research covering the longest possible time period. We therefore also plan to extend the existing SCIAMACHY and GOSAT data set discussed here by using also data from other missions (e.g., OCO-2, GOSAT-2, CarbonSat) in the future.

Download Full-text

A review of associative classification mining

The Knowledge Engineering Review ◽

10.1017/s0269888907001026 ◽

2007 ◽

Vol 22 (1) ◽

pp. 37-65 ◽

Cited By ~ 155

Author(s):

FADI THABTAH

Keyword(s):

Classification Systems ◽

Quality Data ◽

Data Sets ◽

Rule Discovery ◽

Associative Classification ◽

Future Directions ◽

Association Rule Discovery ◽

Rule Pruning ◽

Associative Classifiers ◽

Pruning Rule

AbstractAssociative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper.

Download Full-text

Quick Knowledge Reduction Based on Divide and Conquer Method in Huge Data Sets

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-540-77046-6_39 ◽

2007 ◽

pp. 312-315 ◽

Cited By ~ 2

Author(s):

Guoyin Wang ◽

Feng Hu

Keyword(s):

Divide And Conquer ◽

Data Sets ◽

Knowledge Reduction ◽

Huge Data

Download Full-text

Mining Environmental Data in the ADMIRE Project Using New Advanced Methods and Tools

Technology Integration Advancements in Distributed Systems and Computing ◽

10.4018/978-1-4666-0906-8.ch018 ◽

2012 ◽

pp. 296-308

Author(s):

Ondrej Habala ◽

Martin Šeleng ◽

Viet Tran ◽

Branislav Šimo ◽

Ladislav Hluchý

Keyword(s):

Data Mining ◽

Environmental Data ◽

Environmental Applications ◽

Data Sets ◽

Distributed Data ◽

New Methods ◽

Prospective Application ◽

Using Data ◽

Computer Power

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.

Download Full-text

Construction and Analysis of Protein-Protein Interaction Network

Advances in Medical Technologies and Clinical Practice - Computer Applications in Drug Discovery and Development ◽

10.4018/978-1-5225-7326-5.ch009 ◽

2019 ◽

pp. 204-220

Author(s):

Divya Dasagrandhi ◽

Arul Salomee Kamalabai Ravindran ◽

Anusuyadevi Muthuswamy ◽

Jayachandran K. S.

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Interaction Network ◽

Data Sets ◽

Protein Protein Interaction ◽

Huge Data ◽

Associated Proteins ◽

High Throughput Experiments ◽

Generation Sequencing ◽

Protein Protein Interaction Network

Understanding the mechanisms of a disease is highly complicated due to the complex pathways involved in the disease progression. Despite several decades of research, the occurrence and prognosis of the diseases is not completely understood even with high throughput experiments like DNA microarray and next-generation sequencing. This is due to challenges in analysis of huge data sets. Systems biology is one of the major divisions of bioinformatics and has laid cutting edge techniques for the better understanding of these pathways. Construction of protein-protein interaction network (PPIN) guides the modern scientists to identify vital proteins through protein-protein interaction network, which facilitates the identification of new drug target and associated proteins. The chapter is focused on PPI databases, construction of PPINs, and its analysis.

Download Full-text

Remote Patient Monitoring for Healthcare

Advances in Data Mining and Database Management - Managerial Perspectives on Intelligent Big Data Analytics ◽

10.4018/978-1-5225-7277-0.ch009 ◽

2019 ◽

pp. 163-179 ◽

Cited By ~ 2

Author(s):

Andrew Stranieri ◽

Venki Balasubramanian

Keyword(s):

Real Time ◽

Data Analytics ◽

Patient Monitoring ◽

Wearable Sensors ◽

Data Sets ◽

Remote Patient Monitoring ◽

Data Mining Algorithms ◽

Real Time Analysis ◽

Huge Data ◽

Remote Patient

Remote patient monitoring involves the collection of data from wearable sensors that typically requires analysis in real time. The real-time analysis of data streaming continuously to a server challenges data mining algorithms that have mostly been developed for static data residing in central repositories. Remote patient monitoring also generates huge data sets that present storage and management problems. Although virtual records of every health event throughout an individual's lifespan known as the electronic health record are rapidly emerging, few electronic records accommodate data from continuous remote patient monitoring. These factors combine to make data analytics with continuous patient data very challenging. In this chapter, benefits for data analytics inherent in the use of standards for clinical concepts for remote patient monitoring is presented. The openEHR standard that describes the way in which concepts are used in clinical practice is well suited to be adopted as the standard required to record meta-data about remote monitoring. The claim is advanced that this is likely to facilitate meaningful real time analyses with big remote patient monitoring data. The point is made by drawing on a case study involving the transmission of patient vital sign data collected from wearable sensors in an Indian hospital.

Download Full-text

Approximate Matching Between XML Documents and Schemas with Applications in XML Classification and Clustering

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch005 ◽

2011 ◽

pp. 99-124 ◽

Cited By ~ 1

Author(s):

Guangming Xing

Keyword(s):

Structural Information ◽

Experimental Studies ◽

Feature Representation ◽

Document Management ◽

Data Sets ◽

Structural Classification ◽

Approximate Matching ◽

Xml Documents ◽

Efficiency And Effectiveness ◽

The Cost

Classification/clustering of XML documents based on their structural information is important for many tasks related with document management. In this chapter, we present a suite of algorithms to compute the cost for approximate matching between XML documents and schemas. A framework for classifying/clustering XML documents by structure is then presented based on the computation of distances between XML documents and schemas. The backbone of the framework is the feature representation using a vector of the distances. Experimental studies were conducted on various XML data sets, suggesting the efficiency and effectiveness of our approach as a solution for structural classification/clustering of XML documents.

Download Full-text

Data Mining for Fraud Detection System

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch065 ◽

2011 ◽

pp. 411-416

Author(s):

Roberto Marmo

Keyword(s):

Data Mining ◽

Credit Card ◽

User Behavior ◽

Detection System ◽

Fraud Detection ◽

Modern Technology ◽

Data Sets ◽

Security Measures ◽

Economic Activities ◽

Huge Data

As a conseguence of expansion of modern technology, the number and scenario of fraud are increasing dramatically. Therefore, the reputation blemish and losses caused are primary motivations for technologies and methodologies for fraud detection that have been applied successfully in some economic activities. The detection involves monitoring the behavior of users based on huge data sets such as the logged data and user behavior. The aim of this contribution is to show some data mining techniques for fraud detection and prevention with applications in credit card and telecommunications, within a business of mining the data to achieve higher cost savings, and also in the interests of determining potential legal evidence. The problem is very difficult because fraudsters takes many different forms and are adaptive, so they will usually look for ways to avoid every security measures.

Download Full-text

The Unified Protocol

Applications of the Unified Protocol for Transdiagnostic Treatment of Emotional Disorders ◽

10.1093/med-psych/9780190255541.003.0017 ◽

2017 ◽

pp. 291-302

Author(s):

Clair Cassiello-Robbins ◽

Heather Murray Latin ◽

Shannon Sauer-Zavala

Keyword(s):

Emotional Disorders ◽

Individual Treatment ◽

Compelling Evidence ◽

Future Directions ◽

Treatment Delivery ◽

Transdiagnostic Treatment ◽

New Methods ◽

Unified Protocol ◽

Effective Care ◽

Methods Of Treatment

Previous chapters in this book have been dedicated to exploring the use of the Unified Protocol for the Transdiagnostic Treatment of Emotional Disorders (UP) with a variety of emotional disorders, and those chapters present compelling evidence for the efficacy of the UP across the range of these disorders. This evidence has now inspired further research efforts to better understand the contribution of individual treatment components, as well as to evaluate new methods of treatment delivery with the goal of furthering dissemination. The purpose of this chapter is to describe future directions and projects related to the UP, including dismantling studies, dissemination efforts, and the development of an Internet-delivered form of the intervention. Further, we discuss how these projects will enhance our understanding of the UP and how these endeavors will further a clinician’s ability to provide efficient, effective care.

Download Full-text