Enhancing Security on Touch-Screen Sensors with Augmented Handwritten Signatures

Majd Abazid; Nesma Houmani; Sonia Garcia-Salicetti

doi:10.3390/s20030933

Enhancing Security on Touch-Screen Sensors with Augmented Handwritten Signatures

Sensors ◽

10.3390/s20030933 ◽

2020 ◽

Vol 20 (3) ◽

pp. 933 ◽

Cited By ~ 1

Author(s):

Majd Abazid ◽

Nesma Houmani ◽

Sonia Garcia-Salicetti

Keyword(s):

Data Quality ◽

Personal Identity ◽

Touch Screen ◽

Automatic Verification ◽

Data Sets ◽

Clustering Techniques ◽

Additional Information ◽

Private Data ◽

Different Types ◽

Verification Systems

We aim at enhancing personal identity security on mobile touch-screen sensors by augmenting handwritten signatures with specific additional information at the enrollment phase. Our former works on several available and private data sets acquired on different sensors demonstrated that there are different categories of signatures that emerge automatically with clustering techniques, based on an entropy-based data quality measure. The behavior of such categories is totally different when confronted to automatic verification systems in terms of vulnerability to attacks. In this paper, we propose a novel and original strategy to reinforce identity security by enhancing signature resistance to attacks, assessed per signature category, both in terms of data quality and verification performance. This strategy operates upstream from the verification system, at the sensor level, by enriching the information content of signatures with personal handwritten inputs of different types. We study this strategy on different signature types of 74 users, acquired in uncontrolled mobile conditions on a largely deployed mobile touch-screen sensor. Our analysis per writer category revealed that adding alphanumeric (date) and handwriting (place) information to the usual signature is the most powerful augmented signature type in terms of verification performance. The relative improvement for all user categories is of at least 93% compared to the usual signature.

Download Full-text

distance based measure of data quality

Advances in Methodology and Statistics ◽

10.51936/npie9973 ◽

2014 ◽

Vol 11 (2) ◽

Author(s):

Pavol Král’ ◽

Lukáš Sobíšek ◽

Mária Stachová

Keyword(s):

Decision Making ◽

Data Quality ◽

Weighting Function ◽

Training Data ◽

Data Sets ◽

Belief Functions ◽

Data Set ◽

Additional Information ◽

The Ideal

Data quality can be seen as a very important factor for the validity of information extracted from data sets using statistical or data mining procedures. In the paper we propose a description of data quality allowing us to characterize data quality of the whole data set, as well as data quality of particular variables and individual cases. On the basis of the proposed description, we define a distance based measure of data quality for individual cases as a distance of the cases from the ideal one. Such a measure can be used as additional information for preparation of a training data set, fitting models, decision making based on results of analyses etc. It can be utilized in different ways ranging from a simple weighting function to belief functions.

Download Full-text

Survey Data Quality in Analyzing Harmonized Indicators of Protest Behavior: A Survey Data Recycling Approach

American Behavioral Scientist ◽

10.1177/00027642211021623 ◽

2021 ◽

pp. 000276422110216

Author(s):

Kazimierz M. Slomczynski ◽

Irina Tomescu-Dubrow ◽

Ilona Wysmulek

Keyword(s):

Data Processing ◽

Data Quality ◽

Survey Data ◽

A Priori ◽

Data Sets ◽

New Approach ◽

Survey Quality ◽

Survey Error ◽

Ex Post ◽

The Impact

This article proposes a new approach to analyze protest participation measured in surveys of uneven quality. Because single international survey projects cover only a fraction of the world’s nations in specific periods, researchers increasingly turn to ex-post harmonization of different survey data sets not a priori designed as comparable. However, very few scholars systematically examine the impact of the survey data quality on substantive results. We argue that the variation in source data, especially deviations from standards of survey documentation, data processing, and computer files—proposed by methodologists of Total Survey Error, Survey Quality Monitoring, and Fitness for Intended Use—is important for analyzing protest behavior. In particular, we apply the Survey Data Recycling framework to investigate the extent to which indicators of attending demonstrations and signing petitions in 1,184 national survey projects are associated with measures of data quality, controlling for variability in the questionnaire items. We demonstrate that the null hypothesis of no impact of measures of survey quality on indicators of protest participation must be rejected. Measures of survey documentation, data processing, and computer records, taken together, explain over 5% of the intersurvey variance in the proportions of the populations attending demonstrations or signing petitions.

Download Full-text

Machine learning for improved data analysis of biological aerosol using the WIBS

Atmospheric Measurement Techniques ◽

10.5194/amt-11-6203-2018 ◽

2018 ◽

Vol 11 (11) ◽

pp. 6203-6230 ◽

Cited By ~ 8

Author(s):

Simon Ruske ◽

David O. Topping ◽

Virginia E. Foot ◽

Andrew P. Morse ◽

Martin W. Gallagher

Keyword(s):

Ultraviolet Light ◽

Fungal Spores ◽

Laboratory Data ◽

Misclassification Rate ◽

Gradient Boosting ◽

Classification Error ◽

Data Sets ◽

Data Preparation ◽

Different Types ◽

Selection Of

Abstract. Primary biological aerosol including bacteria, fungal spores and pollen have important implications for public health and the environment. Such particles may have different concentrations of chemical fluorophores and will respond differently in the presence of ultraviolet light, potentially allowing for different types of biological aerosol to be discriminated. Development of ultraviolet light induced fluorescence (UV-LIF) instruments such as the Wideband Integrated Bioaerosol Sensor (WIBS) has allowed for size, morphology and fluorescence measurements to be collected in real-time. However, it is unclear without studying instrument responses in the laboratory, the extent to which different types of particles can be discriminated. Collection of laboratory data is vital to validate any approach used to analyse data and ensure that the data available is utilized as effectively as possible. In this paper a variety of methodologies are tested on a range of particles collected in the laboratory. Hierarchical agglomerative clustering (HAC) has been previously applied to UV-LIF data in a number of studies and is tested alongside other algorithms that could be used to solve the classification problem: Density Based Spectral Clustering and Noise (DBSCAN), k-means and gradient boosting. Whilst HAC was able to effectively discriminate between reference narrow-size distribution PSL particles, yielding a classification error of only 1.8 %, similar results were not obtained when testing on laboratory generated aerosol where the classification error was found to be between 11.5 % and 24.2 %. Furthermore, there is a large uncertainty in this approach in terms of the data preparation and the cluster index used, and we were unable to attain consistent results across the different sets of laboratory generated aerosol tested. The lowest classification errors were obtained using gradient boosting, where the misclassification rate was between 4.38 % and 5.42 %. The largest contribution to the error, in the case of the higher misclassification rate, was the pollen samples where 28.5 % of the samples were incorrectly classified as fungal spores. The technique was robust to changes in data preparation provided a fluorescent threshold was applied to the data. In the event that laboratory training data are unavailable, DBSCAN was found to be a potential alternative to HAC. In the case of one of the data sets where 22.9 % of the data were left unclassified we were able to produce three distinct clusters obtaining a classification error of only 1.42 % on the classified data. These results could not be replicated for the other data set where 26.8 % of the data were not classified and a classification error of 13.8 % was obtained. This method, like HAC, also appeared to be heavily dependent on data preparation, requiring a different selection of parameters depending on the preparation used. Further analysis will also be required to confirm our selection of the parameters when using this method on ambient data. There is a clear need for the collection of additional laboratory generated aerosol to improve interpretation of current databases and to aid in the analysis of data collected from an ambient environment. New instruments with a greater resolution are likely to improve on current discrimination between pollen, bacteria and fungal spores and even between different species, however the need for extensive laboratory data sets will grow as a result.

Download Full-text

Comparison between XBT data and TOPEX/Poseidon satellite altimetry in the Ligurian-Tyrrhenian area

Annales Geophysicae ◽

10.5194/angeo-21-123-2003 ◽

2003 ◽

Vol 21 (1) ◽

pp. 123-135 ◽

Cited By ~ 18

Author(s):

S. Vignudelli ◽

P. Cipollini ◽

F. Reseghetti ◽

G. Fusco ◽

G. P. Gasparini ◽

...

Keyword(s):

Pilot Project ◽

Atlantic Water ◽

Data Sets ◽

Tyrrhenian Sea ◽

Satellite Altimeter ◽

Ligurian Sea ◽

Data Set ◽

Additional Information ◽

Forecasting System ◽

Computational Errors

Abstract. From September 1999 to December 2000, eXpendable Bathy-Thermograph (XBT) profiles were collected along the Genova-Palermo shipping route in the framework of the Mediterranean Forecasting System Pilot Project (MFSPP). The route is virtually coincident with track 0044 of the TOPEX/Poseidon satellite altimeter, crossing the Ligurian and Tyrrhenian basins in an approximate N–S direction. This allows a direct comparison between XBT and altimetry, whose findings are presented in this paper. XBT sections reveal the presence of the major features of the regional circulation, namely the eastern boundary of the Ligurian gyre, the Bonifacio gyre and the Modified Atlantic Water inflow along the Sicily coast. Twenty-two comparisons of steric heights derived from the XBT data set with concurrent realizations of single-pass altimetric heights are made. The overall correlation is around 0.55 with an RMS difference of less than 3 cm. In the Tyrrhenian Sea the spectra are remarkably similar in shape, but in general the altimetric heights contain more energy. This difference is explained in terms of oceanographic signals, which are captured with a different intensity by the satellite altimeter and XBTs, as well as computational errors. On scales larger than 100 km, the data sets are also significantly coherent, with increasing coherence values at longer wavelengths. The XBTs were dropped every 18–20 km along the track: as a consequence, the spacing scale was unable to resolve adequately the internal radius of deformation (< 20 km). Furthermore, few XBT drops were carried out in the Ligurian Sea, due to the limited north-south extent of this basin, so the comparison is problematic there. On the contrary, the major features observed in the XBT data in the Tyrrhenian Sea are also detected by TOPEX/Poseidon. The manuscript is completed by a discussion on how to integrate the two data sets, in order to extract additional information. In particular, the results emphasize their complementariety in providing a dynamically complete description of the observed structures. Key words. Oceanography: general (descriptive and regional oceanography) Oceanography: physical (sea level variations; instruments and techniques)

Download Full-text

Channel-independent recreation of artefactual signals in chronically recorded local field potentials using machine learning

Brain Informatics ◽

10.1186/s40708-021-00149-x ◽

2022 ◽

Vol 9 (1) ◽

Author(s):

Marcos Fabietti ◽

Mufti Mahmud ◽

Ahmad Lotfi

Keyword(s):

Machine Learning ◽

Open Access ◽

Short Term Memory ◽

The Body ◽

Data Sets ◽

Additional Information ◽

Machine Learning Model ◽

Signal Characteristics ◽

Wide Range ◽

Memory Network

AbstractAcquisition of neuronal signals involves a wide range of devices with specific electrical properties. Combined with other physiological sources within the body, the signals sensed by the devices are often distorted. Sometimes these distortions are visually identifiable, other times, they overlay with the signal characteristics making them very difficult to detect. To remove these distortions, the recordings are visually inspected and manually processed. However, this manual annotation process is time-consuming and automatic computational methods are needed to identify and remove these artefacts. Most of the existing artefact removal approaches rely on additional information from other recorded channels and fail when global artefacts are present or the affected channels constitute the majority of the recording system. Addressing this issue, this paper reports a novel channel-independent machine learning model to accurately identify and replace the artefactual segments present in the signals. Discarding these artifactual segments by the existing approaches causes discontinuities in the reproduced signals which may introduce errors in subsequent analyses. To avoid this, the proposed method predicts multiple values of the artefactual region using long–short term memory network to recreate the temporal and spectral properties of the recorded signal. The method has been tested on two open-access data sets and incorporated into the open-access SANTIA (SigMate Advanced: a Novel Tool for Identification of Artefacts in Neuronal Signals) toolbox for community use.

Download Full-text

Kurator: Tools for Improving Fitness for Use of Biodiversity Data.

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26539 ◽

2018 ◽

Vol 2 ◽

pp. e26539 ◽

Cited By ~ 1

Author(s):

Paul J. Morris ◽

James Hanken ◽

David Lowery ◽

Bertram Ludäscher ◽

James Macklin ◽

...

Keyword(s):

Data Quality ◽

Early Stage ◽

Data Capture ◽

Data Sets ◽

Biodiversity Data ◽

Skill Levels ◽

Use Of Data ◽

Data Life Cycle ◽

Wide Range ◽

Original Observation

As curators of biodiversity data in natural science collections, we are deeply concerned with data quality, but quality is an elusive concept. An effective way to think about data quality is in terms of fitness for use (Veiga 2016). To use data to manage physical collections, the data must be able to accurately answer questions such as what objects are in the collections, where are they and where are they from. Some research uses aggregate data across collections, which involves exchange of data using standard vocabularies. Some research uses require accurate georeferences, collecting dates, and current identifications. It is well understood that the costs of data capture and data quality improvement increase with increasing time from the original observation. These factors point towards two engineering principles for software that is intended to maintain or enhance data quality: build small modular data quality tests that can be easily assembled in suites to assess the fitness of use of data for some particular need; and produce tools that can be applied by users with a wide range of technical skill levels at different points in the data life cycle. In the Kurator project, we have produced code (e.g. Wieczorek et al. 2017, Morris 2016) which consists of small modules that can be incorporated into data management processes as small libraries that address particular data quality tests. These modules can be combined into customizable data quality scripts, which can be run on single computers or scalable architecture and can be incorporated into other software, run as command line programs, or run as suites of canned workflows through a web interface. Kurator modules can be integrated into early stage data capture applications, run to help prepare data for aggregation by matching it to standard vocabularies, be run for quality control or quality assurance on data sets, and can report on data quality in terms of a fitness-for-use framework (Veiga et al. 2017). One of our goals is simple tests usable by anyone anywhere.

Download Full-text

SATzilla: Portfolio-based Algorithm Selection for SAT

Journal of Artificial Intelligence Research ◽

10.1613/jair.2490 ◽

2008 ◽

Vol 32 ◽

pp. 565-606 ◽

Cited By ~ 289

Author(s):

L. Xu ◽

F. Hutter ◽

H. H. Hoos ◽

K. Leyton-Brown

Keyword(s):

Traditional Approach ◽

Data Sets ◽

Algorithm Selection ◽

New Techniques ◽

Algorithm Portfolios ◽

Predicting Performance ◽

Different Types ◽

Sat Solver ◽

Selection For ◽

Problem Instances

It has been widely observed that there is no single "dominant" SAT solver; instead, different solvers perform best on different instances. Rather than following the traditional approach of choosing the best solver for a given class of instances, we advocate making this decision online on a per-instance basis. Building on previous work, we describe SATzilla, an automated approach for constructing per-instance algorithm portfolios for SAT that use so-called empirical hardness models to choose among their constituent solvers. This approach takes as input a distribution of problem instances and a set of component solvers, and constructs a portfolio optimizing a given objective function (such as mean runtime, percent of instances solved, or score in a competition). The excellent performance of SATzilla was independently verified in the 2007 SAT Competition, where our SATzilla07 solvers won three gold, one silver and one bronze medal. In this article, we go well beyond SATzilla07 by making the portfolio construction scalable and completely automated, and improving it by integrating local search solvers as candidate solvers, by predicting performance score instead of runtime, and by using hierarchical hardness models that take into account different types of SAT instances. We demonstrate the effectiveness of these new techniques in extensive experimental results on data sets including instances from the most recent SAT competition.

Download Full-text

Separating Phages From Other Virus Families and Classifying the Different Phage Families By GI-Clusters

10.21203/rs.3.rs-1130357/v1 ◽

2021 ◽

Author(s):

Xingang Jia ◽

Qiuhong Han ◽

Zuhong Lu

Keyword(s):

Test Data ◽

Cluster Algorithm ◽

Nearest Neighbors ◽

Euclidean Algorithm ◽

Training Data ◽

Data Sets ◽

Maximum Element ◽

Clustering Techniques ◽

Cluster Algorithms ◽

Biological Entities

Abstract Background: Phages are the most abundant biological entities, but the commonly used clustering techniques are difficult to separate them from other virus families and classify the different phage families together.Results: This work uses GI-clusters to separate phages from other virus families and classify the different phage families, where GI-clusters are constructed by GI-features, GI-features are constructed by the togetherness with F-features, training data, MG-Euclidean and Icc-cluster algorithms, F-features are the frequencies of multiple-nucleotides that are generated from genomes of viruses, MG-Euclidean algorithm is able to put the nearest neighbors in the same mini-groups, and Icc-cluster algorithm put the distant samples to the different mini-clusters. For these viruses that the maximum element of their GI-features are in the same locations, they are put to the same GI-clusters, where the families of viruses in test data are identified by GI-clusters, and the families of GI-clusters are defined by viruses of training data.Conclusions: From analysis of 4 data sets that are constructed by the different family viruses, we demonstrate that GI-clusters are able to separate phages from other virus families, correctly classify the different phage families, and correctly predict the families of these unknown phages also.

Download Full-text

Digital Bookstore

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36609 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 1326-1337

Author(s):

Amey Thakur

Keyword(s):

Programming Languages ◽

Web Application ◽

Credit Card ◽

Sql Server ◽

Data Sets ◽

Frequent User ◽

Web Browser ◽

Book Store ◽

Different Types ◽

Online Book

The project's main goal is to build an online book store where users can search for and buy books based on title, author, and subject. The chosen books are shown in a tabular style and the customer may buy them online using a credit card. Using this Website, the user may buy a book online rather than going to a bookshop and spending time. Many online bookstores, such as Powell's and Amazon, were created using HTML. We suggest creating a comparable website with .NET and SQL Server. An online book store is a web application that allows customers to purchase ebooks. Through a web browser the customers can search for a book by its title or author, later can add it to the shopping cart and finally purchase using a credit card transaction. The client may sign in using his login credentials, or new clients can simply open an account. Customers must submit their full name, contact details, and shipping address. The user may also provide a review of a book by rating it on a scale of one to five. The books are classified into different types depending on their subject matter, such as software, databases, English, and architecture. Customers can shop online at the Online Book Store Website using a web browser. A client may create an account, sign in, add things to his shopping basket, and buy the product using his credit card information. As opposed to a frequent user, the Administrator has more abilities. He has the ability to add, delete, and edit book details, book categories, and member information, as well as confirm a placed order. This application was created with PHP and web programming languages. The Online Book Store is built using the Master page, data sets, data grids, and user controls.

Download Full-text

Truthful Mobile Crowdsensing for Strategic Users With Private Data Quality

IEEE/ACM Transactions on Networking ◽

10.1109/tnet.2019.2934026 ◽

2019 ◽

Vol 27 (5) ◽

pp. 1959-1972 ◽

Cited By ~ 1

Author(s):

Xiaowen Gong ◽

Ness B. Shroff

Keyword(s):

Data Quality ◽

Mobile Crowdsensing ◽

Private Data

Download Full-text