A Similarity Classifier with Bonferroni Mean Operators

Advances in Fuzzy Systems ◽

10.1155/2016/7173054 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Onesfole Kurama ◽

Pasi Luukka ◽

Mikael Collan

Keyword(s):

Medical Research ◽

Real World ◽

Classification Accuracy ◽

Arithmetic Mean ◽

Data Sets ◽

Related Data ◽

Bonferroni Mean ◽

Generalized Mean ◽

Linguistic Quantifiers

A similarity classifier based on Bonferroni mean based operators is introduced. The new Bonferroni mean based variant of the similarity classifier is also extended to cover a new Bonferroni-OWA variant. The new Bonferroni-OWA based similarity classifier raises the question of how to accomplish the weighting needed and for this reason we also examine a number of linguistic quantifiers for weight generation. The new proposed similarity classifier variants are tested on four real world medical research related data sets. The results are compared with results from two previously presented similarity classifiers, one based on the generalized mean and another based on an arithmetic mean operator. The results show that comparatively better classification accuracy can be reached with the proposed new similarity classifier variants.

Download Full-text

Novel Visualization of Large Health Related Data Sets

10.21236/ada624744 ◽

2015 ◽

Author(s):

William E. Hammond ◽

Vivian L. West ◽

David Borland ◽

Igor Akushevich ◽

Eugenia M. Heinz

Keyword(s):

Data Sets ◽

Related Data ◽

Health Related

Download Full-text

Hfinger: Malware HTTP Request Fingerprinting

Entropy ◽

10.3390/e23050507 ◽

2021 ◽

Vol 23 (5) ◽

pp. 507

Author(s):

Piotr Białczak ◽

Wojciech Mazurczyk

Keyword(s):

Real World ◽

Network Traffic ◽

Experimental Evaluation ◽

Data Sets ◽

Real World Data ◽

Malicious Software ◽

Default Mode ◽

World Data ◽

Effectiveness Analysis ◽

Http Protocol

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.

Download Full-text

Different algorithms, different models

Quality & Quantity ◽

10.1007/s11135-021-01193-9 ◽

2021 ◽

Author(s):

Martyna Daria Swiatczak

Keyword(s):

Comparative Analysis ◽

Real World ◽

Qualitative Comparative Analysis ◽

Comparative Methods ◽

Data Sets ◽

Simulation Studies ◽

Threshold Values ◽

Real World Data ◽

Software Packages ◽

Methodological Approaches

AbstractThis study assesses the extent to which the two main Configurational Comparative Methods (CCMs), i.e. Qualitative Comparative Analysis (QCA) and Coincidence Analysis (CNA), produce different models. It further explains how this non-identity is due to the different algorithms upon which both methods are based, namely QCA’s Quine–McCluskey algorithm and the CNA algorithm. I offer an overview of the fundamental differences between QCA and CNA and demonstrate both underlying algorithms on three data sets of ascending proximity to real-world data. Subsequent simulation studies in scenarios of varying sample sizes and degrees of noise in the data show high overall ratios of non-identity between the QCA parsimonious solution and the CNA atomic solution for varying analytical choices, i.e. different consistency and coverage threshold values and ways to derive QCA’s parsimonious solution. Clarity on the contrasts between the two methods is supposed to enable scholars to make more informed decisions on their methodological approaches, enhance their understanding of what is happening behind the results generated by the software packages, and better navigate the interpretation of results. Clarity on the non-identity between the underlying algorithms and their consequences for the results is supposed to provide a basis for a methodological discussion about which method and which variants thereof are more successful in deriving which search target.

Download Full-text

Improving recommender systems’ performance on cold-start users and controversial items by a new similarity model

International Journal of Web Information Systems ◽

10.1108/ijwis-07-2015-0024 ◽

2016 ◽

Vol 12 (2) ◽

pp. 126-149 ◽

Cited By ~ 4

Author(s):

Masoud Mansoury ◽

Mehdi Shajari

Keyword(s):

Real World ◽

Design Methodology ◽

Cold Start ◽

Selection Function ◽

Data Sets ◽

Real World Data ◽

Content Type ◽

User Similarity ◽

Active User ◽

Similarity Model

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.

Download Full-text

Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets

10.1109/iccvw54120.2021.00110 ◽

2021 ◽

Author(s):

Florentin Poucin ◽

Andrea Kraus ◽

Martin Simon

Keyword(s):

Real World ◽

Synthetic Data ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Instance Segmentation

Download Full-text

Quality criteria for Real-World Data in pharmaceutical research and healthcare decision making. An Austrian Expert Consensus (Preprint)

10.2196/preprints.34204 ◽

2021 ◽

Author(s):

Peter Klimek ◽

Dejan Baltic ◽

Martin Brunner ◽

Alexander Degelsegger-Marquez ◽

Gerhard Garhöfer ◽

...

Keyword(s):

Decision Making ◽

Medical Research ◽

Real World ◽

Pharmaceutical Medicine ◽

Pharmaceutical Research ◽

Quality Criteria ◽

Clinical Decision ◽

European Medicines Agency ◽

Real World Data ◽

World Data

UNSTRUCTURED Real-world data (RWD) collected in routine healthcare processes and transformed to real-world evidence (RWE) has become increasingly interesting within research and medical communities to enhance medical research and support regulatory decision making. Despite numerous European initiatives, there is still no cross-border consensus or guideline determining which quality RWD must meet in order to be acceptable for decision making within regulatory or routine clinical decision support. An Austrian expert group led by GPMed (Gesellschaft für Pharmazeutische Medizin, Austrian Society for Pharmaceutical Medicine) reviewed drafted guidelines, published recommendations or viewpoints to derive a consensus statement on quality criteria for RWD to be used more effectively for medical research purposes beyond registry-based studies discussed in the European Medicines Agency (EMA) guideline for registry-based studies

Download Full-text

Comparison of M2M Traffic Models Against Real World Data Sets

2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) ◽

10.1109/camad.2018.8515000 ◽

2018 ◽

Cited By ~ 1

Author(s):

Marco Sansoni ◽

Giuseppe Ravagnani ◽

Daniel Zucchetto ◽

Chiara Pielli ◽

Andrea Zanella ◽

...

Keyword(s):

Real World ◽

Data Sets ◽

Real World Data ◽

Traffic Models ◽

World Data

Download Full-text

Future of Medical Research in Rare Diseases and Cancers: Shift from Pharma to Biotech and the Golden Age of Medical Advancement

Cancer and Clinical Oncology ◽

10.5539/cco.v6n2p12 ◽

2017 ◽

Vol 6 (2) ◽

pp. 12

Author(s):

Abhith Pallegar

Keyword(s):

Big Data ◽

Medical Research ◽

Rare Diseases ◽

Network Effects ◽

Medical Knowledge ◽

Economic Cost ◽

Medical Data ◽

Data Sets ◽

Leading Role ◽

Diverse Data

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor.

Download Full-text

Optimising Daily Fantasy Sports Teams with Artificial Intelligence

International Journal of Computer Science in Sport ◽

10.2478/ijcss-2020-0008 ◽

2020 ◽

Vol 19 (2) ◽

pp. 21-35

Author(s):

Ryan Beal ◽

Timothy J. Norman ◽

Sarvapali D. Ramchurn

Keyword(s):

Real World ◽

National Football League ◽

Mixed Integer ◽

Data Sets ◽

Fantasy Sports ◽

Real World Data ◽

Sports Teams ◽

Novel Approach ◽

Four Seasons ◽

Daily Fantasy Sports

AbstractThis paper outlines a novel approach to optimising teams for Daily Fantasy Sports (DFS) contests. To this end, we propose a number of new models and algorithms to solve the team formation problems posed by DFS. Specifically, we focus on the National Football League (NFL) and predict the performance of real-world players to form the optimal fantasy team using mixed-integer programming. We test our solutions using real-world data-sets from across four seasons (2014-2017). We highlight the advantage that can be gained from using our machine-based methods and show that our solutions outperform existing benchmarks, turning a profit in up to 81.3% of DFS game-weeks over a season.

Download Full-text

Validation-Based Sparse Gaussian Process Classifier Design

Neural Computation ◽

10.1162/neco.2009.03-08-724 ◽

2009 ◽

Vol 21 (7) ◽

pp. 2082-2103 ◽

Cited By ~ 1

Author(s):

Shirish Shevade ◽

S. Sundararajan

Keyword(s):

Bayesian Methods ◽

Real World ◽

Basis Vector ◽

Data Sets ◽

Training Set ◽

Classifier Design ◽

Set Size ◽

Benchmark Data ◽

Regression Problems ◽

Classification And Regression

Gaussian processes (GPs) are promising Bayesian methods for classification and regression problems. Design of a GP classifier and making predictions using it is, however, computationally demanding, especially when the training set size is large. Sparse GP classifiers are known to overcome this limitation. In this letter, we propose and study a validation-based method for sparse GP classifier design. The proposed method uses a negative log predictive (NLP) loss measure, which is easy to compute for GP models. We use this measure for both basis vector selection and hyperparameter adaptation. The experimental results on several real-world benchmark data sets show better or comparable generalization performance over existing methods.

Download Full-text