Continuous Data Types

1982 ◽  
Vol 11 (2) ◽  
pp. 201-216 ◽  
Author(s):  
M. R. Levy ◽  
T. S. E. Maibaum
Keyword(s):  
2018 ◽  
Author(s):  
Derek Beaton ◽  
Kelly M. Sunderland ◽  
Brian Levine ◽  
Jennifer Mandzia ◽  
Mario Masellis ◽  
...  

AbstractThe minimum covariance determinant (MCD) algorithm is one of the most common techniques to detect anomalous or outlying observations. The MCD algorithm depends on two features of multivariate data: the determinant of a matrix (i.e., geometric mean of the eigenvalues) and Mahalanobis distances (MD). While the MCD algorithm is commonly used, and has many extensions, the MCD is limited to analyses of quantitative data and more specifically data assumed to be continuous. One reason why the MCD does not extend to other data types such as categorical or ordinal data is because there is not a well-defined MD for data types other than continuous data. To address the lack of MCD-like techniques for categorical or mixed data we present a generalization of the MCD. To do so, we rely on a multivariate technique called correspondence analysis (CA). Through CA we can define MD via singular vectors and also compute the determinant from CA’s eigenvalues. Here we define and illustrate a generalized MCD on categorical data and then show how our generalized MCD extends beyond categorical data to accommodate mixed data types (e.g., categorical, ordinal, and continuous). We illustrate this generalized MCD on data from two large scale projects: the Ontario Neurodegenerative Disease Research Initiative (ONDRI) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI), with genetics (categorical), clinical instruments and surveys (categorical or ordinal), and neuroimaging (continuous) data. We also make R code and toy data available in order to illustrate our generalized MCD.


Author(s):  
Wang Ren ◽  
Xin Tong ◽  
Jing Du ◽  
Na Wang ◽  
Shancang Li ◽  
...  

AbstractThe Internet of Things (IoT) and Industrial 4.0 bring enormous potential benefits by enabling highly customised services and applications, which create huge volume and variety of data. However, preserving the privacy in IoT and Industrial 4.0 against re-identification attacks is very challenging. In this work, we considered three main data types generated in IoT: context data, continuous data, and media data. We first proposed a stream data anonymisation method based on k-anonymity for data collected by IoT devices; and then privacy enhancing techniques for both continuous data and media data were proposed for different IoT scenarios. The experiment results show that the proposed techniques can well preserve privacy without significantly affecting the utility of the data.


2003 ◽  
Vol 10 (23) ◽  
Author(s):  
Margarita Korovina

The purpose of this paper is to survey our recent research in computability and definability over continuous data types such as the real numbers, real-valued functions and functionals. We investigate the expressive power and algorithmic properties of the language of Sigma-formulas intended to represent computability over the real numbers. In order to adequately represent computability we extend the reals by the structure of hereditarily finite sets. In this setting it is crucial to consider the real numbers without equality since the equality test is undecidable over the reals. We prove Engeler's Lemma for Sigma-definability over the reals without the equality test which relates Sigma-definability with definability in the constructive infinitary language L_{omega_1 omega}. Thus, a relation over the real numbers is Sigma-definable if and only if it is definable by a disjunction of a recursively enumerable set of quantifier free formulas. This result reveals computational aspects of Sigma-definability and also gives topological characterisation of Sigma-definable relations over the reals without the equality test. We also illustrate how computability over the real numbers can be expressed in the language of Sigma-formulas.


Author(s):  
Zhanyou Xu ◽  
Steven B. Cannon ◽  
William D. Beavis

AbstractModels have been developed to account for heterogeneous spatial variation in field trials. These spatial models have been shown to successfully increase the quality of phenotypic data resulting in improved effectiveness of selection by plant breeders. The models were developed for continuous data types such as grain yield and plant height, but data for most traits, such as in iron deficiency chlorosis (IDC), are recorded on ordinal scales. Is it reasonable to make spatial adjustments to ordinal data by simply applying methods developed for continuous data? The objective of the research described herein is to evaluate methods for spatial adjustment on ordinal data, using soybean IDC as an example. Spatial adjustment models are classified into three different groups: group I, moving average grid adjustment; group II, geospatial autoregressive regression (SAR) models; and group III, tensor product penalized P-splines. Comparisons of eight models sampled from these three classes demonstrate that spatial adjustments depend on severity of field heterogeneity, the irregularity of the spatial patterns, and the model used. SAR models generally produce better performance metrics than other classes of models. However, none of the eight evaluated models fully removed spatial patterns indicating that there is a need to either adjust existing models or develop novel models for spatial adjustments of ordinal data collected in fields exhibiting discontinuous transitions between heterogeneous patches.


1990 ◽  
Vol 55 (2) ◽  
pp. 367-381 ◽  
Author(s):  
Kenneth L. Kvamme

Archaeologists commonly employ two-sample statistical tests in regional locational analyses that compare environmental measurements obtained at site locations against measurements taken at random locations from the background environment. One-sample tests that compare a site sample against a background standard are conceptually and statistically superior, but have been difficult to implement for continuous data types. This situation now is changed owing to a relatively new computer technology known as Geographic Information Systems (GIS). GIS can provide a complete description of the nature of the background environment of entire regions for categorical and continuous data types, thereby allowing the ready application of one-sample testing strategies. Examples of several GIS-based one-sample tests are given using data from east-central Arizona. Such conventional tests only should be applied, however, when the observations can be shown to be statistically independent through tests for spatial autocorrelation.


2018 ◽  
Author(s):  
Prathiba Natesan ◽  
Smita Mehta

Single case experimental designs (SCEDs) have become an indispensable methodology where randomized control trials may be impossible or even inappropriate. However, the nature of SCED data presents challenges for both visual and statistical analyses. Small sample sizes, autocorrelations, data types, and design types render many parametric statistical analyses and maximum likelihood approaches ineffective. The presence of autocorrelation decreases interrater reliability in visual analysis. The purpose of the present study is to demonstrate a newly developed model called the Bayesian unknown change-point (BUCP) model which overcomes all the above-mentioned data analytic challenges. This is the first study to formulate and demonstrate rate ratio effect size for autocorrelated data, which has remained an open question in SCED research until now. This expository study also compares and contrasts the results from BUCP model with visual analysis, and rate ratio effect size with nonoverlap of all pairs (NAP) effect size. Data from a comprehensive behavioral intervention are used for the demonstration.


2018 ◽  
Author(s):  
Josephine Ann Urquhart ◽  
Akira O'Connor

Receiver operating characteristics (ROCs) are plots which provide a visual summary of a classifier’s decision response accuracy at varying discrimination thresholds. Typical practice, particularly within psychological studies, involves plotting an ROC from a limited number of discrete thresholds before fitting signal detection parameters to the plot. We propose that additional insight into decision-making could be gained through increasing ROC resolution, using trial-by-trial measurements derived from a continuous variable, in place of discrete discrimination thresholds. Such continuous ROCs are not yet routinely used in behavioural research, which we attribute to issues of practicality (i.e. the difficulty of applying standard ROC model-fitting methodologies to continuous data). Consequently, the purpose of the current article is to provide a documented method of fitting signal detection parameters to continuous ROCs. This method reliably produces model fits equivalent to the unequal variance least squares method of model-fitting (Yonelinas et al., 1998), irrespective of the number of data points used in ROC construction. We present the suggested method in three main stages: I) building continuous ROCs, II) model-fitting to continuous ROCs and III) extracting model parameters from continuous ROCs. Throughout the article, procedures are demonstrated in Microsoft Excel, using an example continuous variable: reaction time, taken from a single-item recognition memory. Supplementary MATLAB code used for automating our procedures is also presented in Appendix B, with a validation of the procedure using simulated data shown in Appendix C.


Sign in / Sign up

Export Citation Format

Share Document