scholarly journals An Improved Approximation Algorithm for Co-location Mining in Uncertain Data Sets using Probabilistic Approach

Author(s):  
M. Sheshikala ◽  
D. Rajeswara Rao ◽  
Md. Ali Kadampur

complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor networksdata, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in colocation mining.One straightforward method is to find the Probabilistic Prevalent colocations (PPCs). This method tries to find allcolocations that are to be generated from a random world. For this we first apply an approximation error to find allthe PPCs which reduce the computations. Next find all the possible worlds and split them into two different worldsand compute the prevalence probability. These worlds are used to compare with a minimum probability threshold todecide whether it is Probabilistic Prevalent colocation (PPCs) or not. The experimental results on the selected dataset show the significant improvement in computational time in comparison to some of the existing methods used incolocation mining.

Author(s):  
M. Sheshikala ◽  
D. Rajeswara Rao ◽  
Md. Ali Kadampur

In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs). This method tries to find all colocations that are to be generated from a random world. For this we first apply an approximation error to find all the PPCs which reduce the computations. Next find all the possible worlds and split them into two different worlds and compute the prevalence probability. These worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant improvement in computational time in comparison to some of the existing methods used in colocation mining.


Author(s):  
K Sobha Rani

Collaborative filtering suffers from the problems of data sparsity and cold start, which dramatically degrade recommendation performance. To help resolve these issues, we propose TrustSVD, a trust-based matrix factorization technique. By analyzing the social trust data from four real-world data sets, we conclude that not only the explicit but also the implicit influence of both ratings and trust should be taken into consideration in a recommendation model. Hence, we build on top of a state-of-the-art recommendation algorithm SVD++ which inherently involves the explicit and implicit influence of rated items, by further incorporating both the explicit and implicit influence of trusted users on the prediction of items for an active user. To our knowledge, the work reported is the first to extend SVD++ with social trust information. Experimental results on the four data sets demonstrate that our approach TrustSVD achieves better accuracy than other ten counterparts, and can better handle the concerned issues.


Author(s):  
Tannistha Pal

Images captured in severe atmospheric catastrophe especially in fog critically degrade the quality of an image and thereby reduces the visibility of an image which in turn affects several computer vision applications like visual surveillance detection, intelligent vehicles, remote sensing, etc. Thus acquiring clear vision is the prime requirement of any image. In the last few years, many approaches have been made towards solving this problem. In this article, a comparative analysis has been made on different existing image defogging algorithms and then a technique has been proposed for image defogging based on dark channel prior strategy. Experimental results show that the proposed method shows efficient results by significantly improving the visual effects of images in foggy weather. Also computational time of the existing techniques are much higher which has been overcame in this paper by using the proposed method. Qualitative assessment evaluation is performed on both benchmark and real time data sets for determining theefficacy of the technique used. Finally, the whole work is concluded with its relative advantages and shortcomings.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 507
Author(s):  
Piotr Białczak ◽  
Wojciech Mazurczyk

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.


2021 ◽  
pp. 1-13
Author(s):  
Qingtian Zeng ◽  
Xishi Zhao ◽  
Xiaohui Hu ◽  
Hua Duan ◽  
Zhongying Zhao ◽  
...  

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.


Author(s):  
Martyna Daria Swiatczak

AbstractThis study assesses the extent to which the two main Configurational Comparative Methods (CCMs), i.e. Qualitative Comparative Analysis (QCA) and Coincidence Analysis (CNA), produce different models. It further explains how this non-identity is due to the different algorithms upon which both methods are based, namely QCA’s Quine–McCluskey algorithm and the CNA algorithm. I offer an overview of the fundamental differences between QCA and CNA and demonstrate both underlying algorithms on three data sets of ascending proximity to real-world data. Subsequent simulation studies in scenarios of varying sample sizes and degrees of noise in the data show high overall ratios of non-identity between the QCA parsimonious solution and the CNA atomic solution for varying analytical choices, i.e. different consistency and coverage threshold values and ways to derive QCA’s parsimonious solution. Clarity on the contrasts between the two methods is supposed to enable scholars to make more informed decisions on their methodological approaches, enhance their understanding of what is happening behind the results generated by the software packages, and better navigate the interpretation of results. Clarity on the non-identity between the underlying algorithms and their consequences for the results is supposed to provide a basis for a methodological discussion about which method and which variants thereof are more successful in deriving which search target.


Author(s):  
A Salman Avestimehr ◽  
Seyed Mohammadreza Mousavi Kalan ◽  
Mahdi Soltanolkotabi

Abstract Dealing with the shear size and complexity of today’s massive data sets requires computational platforms that can analyze data in a parallelized and distributed fashion. A major bottleneck that arises in such modern distributed computing environments is that some of the worker nodes may run slow. These nodes a.k.a. stragglers can significantly slow down computation as the slowest node may dictate the overall computational time. A recent computational framework, called encoded optimization, creates redundancy in the data to mitigate the effect of stragglers. In this paper, we develop novel mathematical understanding for this framework demonstrating its effectiveness in much broader settings than was previously understood. We also analyze the convergence behavior of iterative encoded optimization algorithms, allowing us to characterize fundamental trade-offs between convergence rate, size of data set, accuracy, computational load (or data redundancy) and straggler toleration in this framework.


2021 ◽  
pp. 1-13
Author(s):  
Hailin Liu ◽  
Fangqing Gu ◽  
Zixian Lin

Transfer learning methods exploit similarities between different datasets to improve the performance of the target task by transferring knowledge from source tasks to the target task. “What to transfer” is a main research issue in transfer learning. The existing transfer learning method generally needs to acquire the shared parameters by integrating human knowledge. However, in many real applications, an understanding of which parameters can be shared is unknown beforehand. Transfer learning model is essentially a special multi-objective optimization problem. Consequently, this paper proposes a novel auto-sharing parameter technique for transfer learning based on multi-objective optimization and solves the optimization problem by using a multi-swarm particle swarm optimizer. Each task objective is simultaneously optimized by a sub-swarm. The current best particle from the sub-swarm of the target task is used to guide the search of particles of the source tasks and vice versa. The target task and source task are jointly solved by sharing the information of the best particle, which works as an inductive bias. Experiments are carried out to evaluate the proposed algorithm on several synthetic data sets and two real-world data sets of a school data set and a landmine data set, which show that the proposed algorithm is effective.


Author(s):  
Chen Lin ◽  
Xiaolin Shen ◽  
Si Chen ◽  
Muhua Zhu ◽  
Yanghua Xiao

The study of consumer psychology reveals two categories of consumption decision procedures: compensatory rules and non-compensatory rules. Existing recommendation models which are based on latent factor models assume the consumers follow the compensatory rules, i.e. they evaluate an item over multiple aspects and compute a weighted or/and summated score which is used to derive the rating or ranking of the item. However, it has been shown in the literature of consumer behavior that, consumers adopt non-compensatory rules more often than compensatory rules. Our main contribution in this paper is to study the unexplored area of utilizing non-compensatory rules in recommendation models.Our general assumptions are (1) there are K universal hidden aspects. In each evaluation session, only one aspect is chosen as the prominent aspect according to user preference. (2) Evaluations over prominent and non-prominent aspects are non-compensatory. Evaluation is mainly based on item performance on the prominent aspect. For non-prominent aspects the user sets a minimal acceptable threshold. We give a conceptual model for these general assumptions. We show how this conceptual model can be realized in both pointwise rating prediction models and pair-wise ranking prediction models. Experiments on real-world data sets validate that adopting non-compensatory rules improves recommendation performance for both rating and ranking models.


2016 ◽  
Vol 12 (2) ◽  
pp. 126-149 ◽  
Author(s):  
Masoud Mansoury ◽  
Mehdi Shajari

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.


Sign in / Sign up

Export Citation Format

Share Document