scholarly journals Similarity Digest Search: A Survey and Comparative Analysis of Strategies to Perform Known File Filtering Using Approximate Matching

2017 ◽  
Vol 2017 ◽  
pp. 1-17 ◽  
Author(s):  
Vitor Hugo Galhardo Moia ◽  
Marco Aurélio Amaral Henriques

Digital forensics is a branch of Computer Science aiming at investigating and analyzing electronic devices in the search for crime evidence. There are several ways to perform this search. Known File Filter (KFF) is one of them, where a list of interest objects is used to reduce/separate data for analysis. Holding a database of hashes of such objects, the examiner performs lookups for matches against the target device. However, due to limitations over hash functions (inability to detect similar objects), new methods have been designed, called approximate matching. This sort of function has interesting characteristics for KFF investigations but suffers mainly from high costs when dealing with huge data sets, as the search is usually done by brute force. To mitigate this problem, strategies have been developed to better perform lookups. In this paper, we present the state of the art of similarity digest search strategies, along with a detailed comparison involving several aspects, as time complexity, memory requirement, and search precision. Our results show that none of the approaches address at least these main aspects. Finally, we discuss future directions and present requirements for a new strategy aiming to fulfill current limitations.

2019 ◽  
Vol 26 (8) ◽  
pp. 1311-1327 ◽  
Author(s):  
Pala Rajasekharreddy ◽  
Chao Huang ◽  
Siddhardha Busi ◽  
Jobina Rajkumari ◽  
Ming-Hong Tai ◽  
...  

With the emergence of nanotechnology, new methods have been developed for engineering various nanoparticles for biomedical applications. Nanotheranostics is a burgeoning research field with tremendous prospects for the improvement of diagnosis and treatment of various cancers. However, the development of biocompatible and efficient drug/gene delivery theranostic systems still remains a challenge. Green synthetic approach of nanoparticles with low capital and operating expenses, reduced environmental pollution and better biocompatibility and stability is a latest and novel field, which is advantageous over chemical or physical nanoparticle synthesis methods. In this article, we summarize the recent research progresses related to green synthesized nanoparticles for cancer theranostic applications, and we also conclude with a look at the current challenges and insight into the future directions based on recent developments in these areas.


2015 ◽  
Vol 8 (2) ◽  
pp. 1787-1832 ◽  
Author(s):  
J. Heymann ◽  
M. Reuter ◽  
M. Hilker ◽  
M. Buchwitz ◽  
O. Schneising ◽  
...  

Abstract. Consistent and accurate long-term data sets of global atmospheric concentrations of carbon dioxide (CO2) are required for carbon cycle and climate related research. However, global data sets based on satellite observations may suffer from inconsistencies originating from the use of products derived from different satellites as needed to cover a long enough time period. One reason for inconsistencies can be the use of different retrieval algorithms. We address this potential issue by applying the same algorithm, the Bremen Optimal Estimation DOAS (BESD) algorithm, to different satellite instruments, SCIAMACHY onboard ENVISAT (March 2002–April 2012) and TANSO-FTS onboard GOSAT (launched in January 2009), to retrieve XCO2, the column-averaged dry-air mole fraction of CO2. BESD has been initially developed for SCIAMACHY XCO2 retrievals. Here, we present the first detailed assessment of the new GOSAT BESD XCO2 product. GOSAT BESD XCO2 is a product generated and delivered to the MACC project for assimilation into ECMWF's Integrated Forecasting System (IFS). We describe the modifications of the BESD algorithm needed in order to retrieve XCO2 from GOSAT and present detailed comparisons with ground-based observations of XCO2 from the Total Carbon Column Observing Network (TCCON). We discuss detailed comparison results between all three XCO2 data sets (SCIAMACHY, GOSAT and TCCON). The comparison results demonstrate the good consistency between the SCIAMACHY and the GOSAT XCO2. For example, we found a mean difference for daily averages of −0.60 ± 1.56 ppm (mean difference ± standard deviation) for GOSAT-SCIAMACHY (linear correlation coefficient r = 0.82), −0.34 ± 1.37 ppm (r = 0.86) for GOSAT-TCCON and 0.10 ± 1.79 ppm (r = 0.75) for SCIAMACHY-TCCON. The remaining differences between GOSAT and SCIAMACHY are likely due to non-perfect collocation (±2 h, 10° × 10° around TCCON sites), i.e., the observed air masses are not exactly identical, but likely also due to a still non-perfect BESD retrieval algorithm, which will be continuously improved in the future. Our overarching goal is to generate a satellite-derived XCO2 data set appropriate for climate and carbon cycle research covering the longest possible time period. We therefore also plan to extend the existing SCIAMACHY and GOSAT data set discussed here by using also data from other missions (e.g., OCO-2, GOSAT-2, CarbonSat) in the future.


2007 ◽  
Vol 22 (1) ◽  
pp. 37-65 ◽  
Author(s):  
FADI THABTAH

AbstractAssociative classification mining is a promising approach in data mining that utilizes the association rule discovery techniques to construct classification systems, also known as associative classifiers. In the last few years, a number of associative classification algorithms have been proposed, i.e. CPAR, CMAR, MCAR, MMAC and others. These algorithms employ several different rule discovery, rule ranking, rule pruning, rule prediction and rule evaluation methods. This paper focuses on surveying and comparing the state-of-the-art associative classification techniques with regards to the above criteria. Finally, future directions in associative classification, such as incremental learning and mining low-quality data sets, are also highlighted in this paper.


Author(s):  
Ondrej Habala ◽  
Martin Šeleng ◽  
Viet Tran ◽  
Branislav Šimo ◽  
Ladislav Hluchý

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.


Author(s):  
Divya Dasagrandhi ◽  
Arul Salomee Kamalabai Ravindran ◽  
Anusuyadevi Muthuswamy ◽  
Jayachandran K. S.

Understanding the mechanisms of a disease is highly complicated due to the complex pathways involved in the disease progression. Despite several decades of research, the occurrence and prognosis of the diseases is not completely understood even with high throughput experiments like DNA microarray and next-generation sequencing. This is due to challenges in analysis of huge data sets. Systems biology is one of the major divisions of bioinformatics and has laid cutting edge techniques for the better understanding of these pathways. Construction of protein-protein interaction network (PPIN) guides the modern scientists to identify vital proteins through protein-protein interaction network, which facilitates the identification of new drug target and associated proteins. The chapter is focused on PPI databases, construction of PPINs, and its analysis.


Author(s):  
Andrew Stranieri ◽  
Venki Balasubramanian

Remote patient monitoring involves the collection of data from wearable sensors that typically requires analysis in real time. The real-time analysis of data streaming continuously to a server challenges data mining algorithms that have mostly been developed for static data residing in central repositories. Remote patient monitoring also generates huge data sets that present storage and management problems. Although virtual records of every health event throughout an individual's lifespan known as the electronic health record are rapidly emerging, few electronic records accommodate data from continuous remote patient monitoring. These factors combine to make data analytics with continuous patient data very challenging. In this chapter, benefits for data analytics inherent in the use of standards for clinical concepts for remote patient monitoring is presented. The openEHR standard that describes the way in which concepts are used in clinical practice is well suited to be adopted as the standard required to record meta-data about remote monitoring. The claim is advanced that this is likely to facilitate meaningful real time analyses with big remote patient monitoring data. The point is made by drawing on a case study involving the transmission of patient vital sign data collected from wearable sensors in an Indian hospital.


Author(s):  
Guangming Xing

Classification/clustering of XML documents based on their structural information is important for many tasks related with document management. In this chapter, we present a suite of algorithms to compute the cost for approximate matching between XML documents and schemas. A framework for classifying/clustering XML documents by structure is then presented based on the computation of distances between XML documents and schemas. The backbone of the framework is the feature representation using a vector of the distances. Experimental studies were conducted on various XML data sets, suggesting the efficiency and effectiveness of our approach as a solution for structural classification/clustering of XML documents.


Author(s):  
Roberto Marmo

As a conseguence of expansion of modern technology, the number and scenario of fraud are increasing dramatically. Therefore, the reputation blemish and losses caused are primary motivations for technologies and methodologies for fraud detection that have been applied successfully in some economic activities. The detection involves monitoring the behavior of users based on huge data sets such as the logged data and user behavior. The aim of this contribution is to show some data mining techniques for fraud detection and prevention with applications in credit card and telecommunications, within a business of mining the data to achieve higher cost savings, and also in the interests of determining potential legal evidence. The problem is very difficult because fraudsters takes many different forms and are adaptive, so they will usually look for ways to avoid every security measures.


Author(s):  
Clair Cassiello-Robbins ◽  
Heather Murray Latin ◽  
Shannon Sauer-Zavala

Previous chapters in this book have been dedicated to exploring the use of the Unified Protocol for the Transdiagnostic Treatment of Emotional Disorders (UP) with a variety of emotional disorders, and those chapters present compelling evidence for the efficacy of the UP across the range of these disorders. This evidence has now inspired further research efforts to better understand the contribution of individual treatment components, as well as to evaluate new methods of treatment delivery with the goal of furthering dissemination. The purpose of this chapter is to describe future directions and projects related to the UP, including dismantling studies, dissemination efforts, and the development of an Internet-delivered form of the intervention. Further, we discuss how these projects will enhance our understanding of the UP and how these endeavors will further a clinician’s ability to provide efficient, effective care.


Sign in / Sign up

Export Citation Format

Share Document