Data Cost Games as an Application of 1-Concavity in Cooperative Game Theory

Journal of Applied Mathematics ◽

10.1155/2014/249543 ◽

2014 ◽

Vol 2014 ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Dongshuang Hou ◽

Theo Driessen

Keyword(s):

Cost Function ◽

Cost Sharing ◽

Original Data ◽

Data Sets ◽

Data Set ◽

Geometrical Characterization ◽

Suitable Generalization ◽

The Individual ◽

Theoretical Results ◽

Cost Game

The main goal is to reveal the 1-concavity property for a subclass of cost games called data cost games. The motivation for the study of the 1-concavity property is the appealing theoretical results for both the core and the nucleolus, in particular their geometrical characterization as well as their additivity property. The characteristic cost function of the original data cost game assigns to every coalition the additive cost of reproducing the data the coalition does not own. The underlying data and cost sharing situation is composed of three components, namely, the player set, the collection of data sets for individuals, and the additive cost function on the whole data set. The proof of 1-concavity is direct, but robust to a suitable generalization of the characteristic cost function. As an adjunct, the 1-concavity property is shown for the subclass of so-called “bicycle” cost games, inclusive of the data cost games in which the individual data sets are nested in a decreasing order.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

The GTC exoplanet transit spectroscopy survey

Astronomy and Astrophysics ◽

10.1051/0004-6361/201834063 ◽

2019 ◽

Vol 622 ◽

pp. A172 ◽

Cited By ~ 7

Author(s):

F. Murgas ◽

G. Chen ◽

E. Pallé ◽

L. Nortmann ◽

G. Nowak

Keyword(s):

Transmission Spectrum ◽

Rayleigh Scattering ◽

Extrasolar Planets ◽

Light Curves ◽

Data Sets ◽

Transmission Spectra ◽

Data Set ◽

Systematic Effects ◽

The Individual ◽

Spectrum Slope

Context. Rayleigh scattering in a hydrogen-dominated exoplanet atmosphere can be detected using ground- or space-based telescopes. However, stellar activity in the form of spots can mimic Rayleigh scattering in the observed transmission spectrum. Quantifying this phenomena is key to our correct interpretation of exoplanet atmospheric properties. Aims. We use the ten-meter Gran Telescopio Canarias (GTC) telescope to carry out a ground-based transmission spectra survey of extrasolar planets to characterize their atmospheres. In this paper we investigate the exoplanet HAT-P-11b, a Neptune-sized planet orbiting an active K-type star. Methods. We obtained long-slit optical spectroscopy of two transits of HAT-P-11b with the Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy (OSIRIS) on August 30, 2016 and September 25, 2017. We integrated the spectrum of HAT-P-11 and one reference star in several spectroscopic channels across the λ ~ 400–785 nm region, creating numerous light curves of the transits. We fit analytic transit curves to the data taking into account the systematic effects and red noise present in the time series in an effort to measure the change of the planet-to-star radius ratio (Rp∕Rs) across wavelength. Results. By fitting both transits together, we find a slope in the transmission spectrum showing an increase of the planetary radius towards blue wavelengths. Closer inspection of the transmission spectrum of the individual data sets reveals that the first transit presents this slope while the transmission spectrum of the second data set is flat. Additionally, we detect hints of Na absorption on the first night, but not on the second. We conclude that the transmission spectrum slope and Na absorption excess found in the first transit observation are caused by unocculted stellar spots. Modeling the contribution of unocculted spots to reproduce the results of the first night we find a spot filling factor of δ = 0.62−0.17+0.20 and a spot-to-photosphere temperature difference of ΔT = 429−299+184 K.

Download Full-text

A classification approach based on variable precision rough sets and cluster validity index function

Engineering Computations ◽

10.1108/ec-11-2012-0297 ◽

2014 ◽

Vol 31 (8) ◽

pp. 1778-1789

Author(s):

Hongkang Lin

Keyword(s):

Optimal Number ◽

Data Sets ◽

Cluster Validity ◽

Cluster Validity Index ◽

Index Method ◽

Data Set ◽

Content Type ◽

The Individual ◽

Variable Precision Rough Sets ◽

Optimal Number Of Clusters

Purpose – The clustering/classification method proposed in this study, designated as the PFV-index method, provides the means to solve the following problems for a data set characterized by imprecision and uncertainty: first, discretizing the continuous values of all the individual attributes within a data set; second, evaluating the optimality of the discretization results; third, determining the optimal number of clusters per attribute; and fourth, improving the classification accuracy (CA) of data sets characterized by uncertainty. The paper aims to discuss these issues. Design/methodology/approach – The proposed method for the solution of the clustering/classifying problem, designated as PFV-index method, combines a particle swarm optimization algorithm, fuzzy C-means method, variable precision rough sets theory, and a new cluster validity index function. Findings – This method could cluster the values of the individual attributes within the data set and achieves both the optimal number of clusters and the optimal CA. Originality/value – The validity of the proposed approach is investigated by comparing the classification results obtained for UCI data sets with those obtained by supervised classification BPNN, decision-tree methods.

Download Full-text

Dynamic spatio-temporal generation of large-scale synthetic gridded precipitation: with improved spatial coherence of extremes

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-019-01724-9 ◽

2019 ◽

Vol 34 (9) ◽

pp. 1369-1383 ◽

Cited By ~ 1

Author(s):

Dirk Diederen ◽

Ye Liu

Keyword(s):

Large Scale ◽

Spatial Coherence ◽

Original Data ◽

Return Level ◽

Data Sets ◽

Large Set ◽

Precipitation Data ◽

Data Set ◽

Spatio Temporal ◽

Synthetic Precipitation

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.

Download Full-text

European household’s income, consumption and wealth

Statistical Journal of the IAOS ◽

10.3233/sji-190528 ◽

2020 ◽

Vol 36 (4) ◽

pp. 1175-1188

Author(s):

Pierre Lamarche ◽

Friderike Oehler ◽

Irene Rioboo

Keyword(s):

Impact Analysis ◽

Well Being ◽

Original Data ◽

Data Sets ◽

The European Union ◽

Micro Data ◽

Data Set ◽

Statistical Matching ◽

Full Picture ◽

Poverty Indicators

Poverty indicators purely based on income statistics do not reflect the full picture of household’s economic well-being. Consumption and wealth are two additional key dimensions that determine the economic opportunities of people or material inequalities. We use non-parametric statistical matching methods to join consumption data from the Household Budget Survey to micro data from the European Union Statistics on Income and Living Conditions. In a second step, micro data from the Household Finance and Consumption Survey are joint to produce a common distribution of income, consumption and wealth variables. A variety of different indicators is then produced based on this joint data set, in particular household saving rates. Care has to be taken when interpreting the indicators, since the statistical matching is based on strong assumptions and a limited number of variables common to all of the three original data sets. We are able to show, however, that the assumptions made are justified by the use of strong proxies as matching variables. Thus, the resulting indicators have the potential to contribute to the analysis of inequality patterns and enhance the possibilities of social, and possibly fiscal, policy impact analysis.

Download Full-text

EFFECTS OF NONSINGULAR PREPROCESSING ON FEEDFORWARD NETWORK TRAINING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004022 ◽

2005 ◽

Vol 19 (02) ◽

pp. 217-247 ◽

Cited By ~ 6

Author(s):

CHANGHUA YU ◽

MICHAEL T. MANRY ◽

JIANG LI

Keyword(s):

Back Propagation ◽

Original Data ◽

Data Sets ◽

Training Algorithm ◽

Feedforward Network ◽

Data Set ◽

Network Training ◽

The Neural Network ◽

Hidden Layer ◽

Theoretical Analyses

In the neural network literature, many preprocessing techniques, such as feature de-correlation, input unbiasing and normalization, are suggested to accelerate multilayer perceptron training. In this paper, we show that a network trained with an original data set and one trained with a linear transformation of the original data will go through the same training dynamics, as long as they start from equivalent states. Thus preprocessing techniques may not be helpful and are merely equivalent to using a different weight set to initialize the network. Theoretical analyses of such preprocessing approaches are given for conjugate gradient, back propagation and the Newton method. In addition, an efficient Newton-like training algorithm is proposed for hidden layer training. Experiments on various data sets confirm the theoretical analyses and verify the improvement of the new algorithm.

Download Full-text

Rational Political Man: A Synthesis of Economic and Social-Psychological Perspectives

American Political Science Review ◽

10.1017/s0003055400263223 ◽

1969 ◽

Vol 63 (4) ◽

pp. 1106-1119 ◽

Cited By ~ 35

Author(s):

Michael J. Shapiro

Keyword(s):

Voting Behavior ◽

Large Scale ◽

Data Gathering ◽

Original Data ◽

Data Sets ◽

Party Affiliation ◽

Theoretical Frameworks ◽

Psychological Variables ◽

The Individual ◽

Group Memberships

In recent years the welter of data accumulated on American voting behavior has been continually reanalyzed by social scientists interested in building theories of electoral choice. Most of the original data-gathering enterprises were guided by general theoretical frameworks which, for the most part, were not developed to a point where the ensuing analyses addressed themselves unambiguously to the overall conceptions by which they were guided. As a result much of our knowledge about voting behavior is in the form of generalizations about what social and psychological variables account for voting choices while we lack conceptual frameworks which systematically interrelate these generalizations and provide comprehensive and parsimonious explanation. If any one unifying conception has emerged from the original large scale studies it is that the average voter is irrational. This inference has been derived from a variety of empirical relationships coupled with varying conceptions of rationality.The more recent reanalyses of these data sets have been characterized by a theoretical sophistication that was lacking heretofore. One of these, a theory of the calculus of voting, has applied some formal rigor to the question of the rationality of the decision to vote, selected empirical equivalents of theoretical entities from survey data on national elections, and conducted a successful test of the theory. Unlike traditional approaches to the rationality question which infer the degree of rationality from quantities of information possessed or from correlates of decisions (background, party affiliation, group memberships, etc.), this investigation conceived of rationality in terms of the kind of calculus employed by the individual in deciding among alternatives (in this case whether or not to vote).

Download Full-text

An Improved Classification Analysis on Utility Aware K-Anonymized Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.7748 ◽

2019 ◽

Vol 16 (2) ◽

pp. 445-452

Author(s):

Kishore S. Verma ◽

A. Rajesh ◽

Adeline J. S. Johnsana

Keyword(s):

Data Mining ◽

Analytical Approach ◽

Value Added ◽

Data Sets ◽

Data Set ◽

Privacy Preserving Data Mining ◽

Privacy Leakage ◽

Anonymized Data ◽

Null Values ◽

The Individual

K anonymization is one of the worldwide used approaches to protect the individual records from the privacy leakage attack of Privacy Preserving Data Mining (PPDM) arena. Typically anonymized dataset will impact the effectiveness of data mining results. Anyhow, currently researchers of PPDM progress in driving their efforts in finding out the optimum trade-off between privacy and utility. This work tends in bringing out the optimum classifier from a set of best classifiers of data mining approaches that are capable enough in generating value-added classifying results on utility aware k-anonymized data set. We performed the analytical approach on the data set that are anonymized in sense of accompanying the anonymity utility factors like null values count and transformation pattern loss. The experimentation is done with three widely used classifiers HNB, PART and J48 and these classifiers are analysed with Accuracy, F-measure, and ROC-AUC which are literately proved to be the perfect measures of classification. Our experimental analysis reveals the best classifiers on the utility aware anonymized data sets of Cell oriented Anonymization (CoA), Attribute oriented Anonymization (AoA) and Record oriented Anonymization (RoA).

Download Full-text

Elastic internal multiple analysis and attenuation using Marchenko and interferometric methods

Geophysics ◽

10.1190/geo2016-0162.1 ◽

2017 ◽

Vol 82 (2) ◽

pp. Q1-Q12 ◽

Cited By ~ 20

Author(s):

Carlos Alberto da Costa Filho ◽

Giovanni Angelo Meles ◽

Andrew Curtis

Keyword(s):

Synthetic Data ◽

Elastic Solid ◽

Original Data ◽

Data Sets ◽

Data Set ◽

Solid Media ◽

Elastic Data ◽

Vertical Density ◽

Internal Multiples ◽

Acoustic Approximation

Conventional seismic processing aims to create data that contain only primary reflections, whereas real seismic recordings also contain multiples. As such, it is desirable to predict, identify, and attenuate multiples in seismic data. This task is more difficult in elastic (solid) media because mode conversions create families of internal multiples not present in the acoustic case. We have developed a method to predict prestack internal multiples in general elastic media based on the Marchenko method and convolutional interferometry. It can be used to identify multiples directly in prestack data or migrated sections, as well as to attenuate internal multiples by adaptively subtracting them from the original data set. We developed the method on two synthetic data sets, the first composed of horizontal density layers and constant velocities, and the second containing horizontal and vertical density and velocity variations. The full-elastic method is computationally expensive and ideally uses data components that are not usually recorded. We therefore tested an acoustic approximation to the method on the synthetic elastic data from the second model and find that although the spatial resolution of the resulting image is reduced by this approximation, it provides images with relatively fewer artifacts. We conclude that in most cases where cost is a factor and we are willing to sacrifice some resolution, it may be sufficient to apply the acoustic version of this demultiple method.

Download Full-text

An Interpretable Computer-Aided Diagnosis Method for Periodontitis From Panoramic Radiographs

Frontiers in Physiology ◽

10.3389/fphys.2021.655556 ◽

2021 ◽

Vol 12 ◽

Author(s):

Haoyang Li ◽

Juexiao Zhou ◽

Yi Zhou ◽

Qiang Chen ◽

Yangyang She ◽

...

Keyword(s):

Computational Models ◽

Alveolar Bone ◽

Calibration Method ◽

Disease Diagnosis ◽

Chronic Inflammatory Disease ◽

Data Sets ◽

Data Set ◽

Panoramic Radiographs ◽

The Individual ◽

Severity Degree

Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20–50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local communities and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and developing interpretable models is crucial for disease diagnosis. Based on these considerations, we propose an interpretable method called Deetal-Perio to predict the severity degree of periodontitis in dental panoramic radiographs. In our method, alveolar bone loss (ABL), the clinical hallmark for periodontitis diagnosis, could be interpreted as the key feature. To calculate ABL, we also propose a method for teeth numbering and segmentation. First, Deetal-Perio segments and indexes the individual tooth via Mask R-CNN combined with a novel calibration method. Next, Deetal-Perio segments the contour of the alveolar bone and calculates a ratio for individual tooth to represent ABL. Finally, Deetal-Perio predicts the severity degree of periodontitis given the ratios of all the teeth. The Macro F1-score and accuracy of the periodontitis prediction task in our method reach 0.894 and 0.896, respectively, on Suzhou data set, and 0.820 and 0.824, respectively on Zhongshan data set. The entire architecture could not only outperform state-of-the-art methods and show robustness on two data sets in both periodontitis prediction, and teeth numbering and segmentation tasks, but also be interpretable for doctors to understand the reason why Deetal-Perio works so well.

Download Full-text