Influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from running waters in Victoria, Australia

1995 ◽  
Vol 46 (2) ◽  
pp. 501 ◽  
Author(s):  
R Marchant ◽  
LA Barmuta ◽  
BC Chessman

The influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from nine Victorian rivers was examined by progressively reducing the degree of detail in the original data (species level, quantitative). Five additional data sets were created that consisted of binary (presence or absence) data on species, quantitative or binary data on families, and quantitative data on PET (plecopteran, ephemeropteran and trichopteran) species or families. Ordinations were performed with detrended correspondence analysis (DCA) and semi-strong hybrid multi-dimensional scaling (SSH). With both ordination techniques, the ordinations of each data set (including the original) revealed the same three underlying gradients. An altitudinal gradient consistently achieved the highest correlations with the ordinations (r = 0.71-0.93), followed by a substratum gradient (r = 0.50-0.88) and a combined pH and conductivity gradient (r = 0.47-0.76). Each of the five less-complete data sets thus provides an adequate degree of detail for ordination analysis and subsequent interpretation of environmental gradients.

Author(s):  
Danlei Xu ◽  
Lan Du ◽  
Hongwei Liu ◽  
Penghui Wang

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.


2019 ◽  
Vol 34 (9) ◽  
pp. 1369-1383 ◽  
Author(s):  
Dirk Diederen ◽  
Ye Liu

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.


2020 ◽  
Vol 36 (4) ◽  
pp. 1175-1188
Author(s):  
Pierre Lamarche ◽  
Friderike Oehler ◽  
Irene Rioboo

Poverty indicators purely based on income statistics do not reflect the full picture of household’s economic well-being. Consumption and wealth are two additional key dimensions that determine the economic opportunities of people or material inequalities. We use non-parametric statistical matching methods to join consumption data from the Household Budget Survey to micro data from the European Union Statistics on Income and Living Conditions. In a second step, micro data from the Household Finance and Consumption Survey are joint to produce a common distribution of income, consumption and wealth variables. A variety of different indicators is then produced based on this joint data set, in particular household saving rates. Care has to be taken when interpreting the indicators, since the statistical matching is based on strong assumptions and a limited number of variables common to all of the three original data sets. We are able to show, however, that the assumptions made are justified by the use of strong proxies as matching variables. Thus, the resulting indicators have the potential to contribute to the analysis of inequality patterns and enhance the possibilities of social, and possibly fiscal, policy impact analysis.


Biologia ◽  
2009 ◽  
Vol 64 (5) ◽  
Author(s):  
Lucia Sekulová ◽  
Michal Hájek

AbstractChanges in composition and structure of alpine and subalpine plant communities in relation to ecological factors were analysed in the Nízke Tatry Mts, Slovakia. Species cover values of vascular and non-vascular plants in each vegetation plot were recorded on the nine-degree scale. A data set of 156 relevés of alpine and subalpine vegetation was sampled recently during one year in the eastern part of the Nízke Tatry National Park. The data set was analysed by cluster analysis and Detrended Correspondence Analysis. analyses were carried out on the entire data set, including the subset of short grassland and dwarf-shrub vegetation. Major gradients and clusters were ecologically interpreted using Ellenberg indicator values. In the entire data set, the major gradient in species composition was associated with nutrient availability and the second most important gradient with light. In the case of short grassland and dwarf-shrub vegetation, the gradients were different. The first one was associated with soil reaction and the second gradient was associated with moisture. Clusters proposed by numerical classification reproduced many traditional phytosociological associations, namely Seslerietum distichae, Sphagno capillifolii-Empetretum nigri, Junco trifidi-Callunetum vulgaris, Juncetum trifidi, Dryopterido dilatatae-Pinetum mugo, Luzuletum obscurae, Agrostio pyrenaiceae-Nardetum strictae, while some other associations were less clearly differentiated (communities of the alliances Calamagrostion villosae, Adenostylion alliariae, Trisetion fusci, Cratoneuro filicini-Calthion laetae or Salicion herbaceae). The next clusters included Vaccinium and Festuca supina dominated communities and artificial roadside grasslands sown 50 years ago. Bryophytes and lichens were highly represented among diagnostic species of particular associations. Distribution pattern of particular plant communities was strongly influenced by site position either on northern or southern slope of the mountains.


Author(s):  
CHANGHUA YU ◽  
MICHAEL T. MANRY ◽  
JIANG LI

In the neural network literature, many preprocessing techniques, such as feature de-correlation, input unbiasing and normalization, are suggested to accelerate multilayer perceptron training. In this paper, we show that a network trained with an original data set and one trained with a linear transformation of the original data will go through the same training dynamics, as long as they start from equivalent states. Thus preprocessing techniques may not be helpful and are merely equivalent to using a different weight set to initialize the network. Theoretical analyses of such preprocessing approaches are given for conjugate gradient, back propagation and the Newton method. In addition, an efficient Newton-like training algorithm is proposed for hidden layer training. Experiments on various data sets confirm the theoretical analyses and verify the improvement of the new algorithm.


Author(s):  
MUSTAPHA LEBBAH ◽  
YOUNÈS BENNANI ◽  
NICOLETA ROGOVSCHI

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.


Geophysics ◽  
2017 ◽  
Vol 82 (2) ◽  
pp. Q1-Q12 ◽  
Author(s):  
Carlos Alberto da Costa Filho ◽  
Giovanni Angelo Meles ◽  
Andrew Curtis

Conventional seismic processing aims to create data that contain only primary reflections, whereas real seismic recordings also contain multiples. As such, it is desirable to predict, identify, and attenuate multiples in seismic data. This task is more difficult in elastic (solid) media because mode conversions create families of internal multiples not present in the acoustic case. We have developed a method to predict prestack internal multiples in general elastic media based on the Marchenko method and convolutional interferometry. It can be used to identify multiples directly in prestack data or migrated sections, as well as to attenuate internal multiples by adaptively subtracting them from the original data set. We developed the method on two synthetic data sets, the first composed of horizontal density layers and constant velocities, and the second containing horizontal and vertical density and velocity variations. The full-elastic method is computationally expensive and ideally uses data components that are not usually recorded. We therefore tested an acoustic approximation to the method on the synthetic elastic data from the second model and find that although the spatial resolution of the resulting image is reduced by this approximation, it provides images with relatively fewer artifacts. We conclude that in most cases where cost is a factor and we are willing to sacrifice some resolution, it may be sufficient to apply the acoustic version of this demultiple method.


Author(s):  
Quanming Yao ◽  
Xiawei Guo ◽  
James Kwok ◽  
Weiwei Tu ◽  
Yuqiang Chen ◽  
...  

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.


2013 ◽  
Vol 65 (1) ◽  
pp. 211-227 ◽  
Author(s):  
Svetlana Acic ◽  
U. Silc ◽  
S. Vrbnicanin ◽  
Svjetlana Cupac ◽  
G. Topisirovic ◽  
...  

The systematic survey of grassland communities was performed on Mt. Stol (eastern Serbia). The main aims of the research were to: (1) determine grassland vegetation types of the researched area; (2) correlate the impacts of the soil and environmental conditions on the occurrence of certain plant communities, and (3) comment on the conservational value of the grasslands in the researched area. The data set included 60 phytosociological relev?s of grasslands recorded between 2001 and 2004. The main environmental gradients of species composition were analyzed by Detrended Correspondence Analysis (DCA). For the ecological interpretation of ordination axes, ecological indicator values were used. Three associations were distinguished: Danthonietum calycinae, Asperulo-Agrostietum vulgaris and Ranunculo bulbosi- Arrhenatheretum elatioris belonging to two alliances and two classes: Chrysopogono-Danthonion - Festuco-Brometea and Arrhenatherion - Molinio-Arrhenatheretea. The results of the DCA support our assumption that the main environmental gradient in the species composition of the grasslands is related to nutrients and moisture. The conservational value of grasslands in the researched area is discussed.


Sign in / Sign up

Export Citation Format

Share Document