Influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from running waters in Victoria, Australia

R Marchant; LA Barmuta; BC Chessman

doi:10.1071/mf9950501

Influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from running waters in Victoria, Australia

Marine and Freshwater Research ◽

10.1071/mf9950501 ◽

1995 ◽

Vol 46 (2) ◽

pp. 501 ◽

Cited By ~ 53

Author(s):

R Marchant ◽

LA Barmuta ◽

BC Chessman

Keyword(s):

Binary Data ◽

Environmental Gradients ◽

Detrended Correspondence Analysis ◽

Original Data ◽

Taxonomic Resolution ◽

Data Sets ◽

Running Waters ◽

Macroinvertebrate Communities ◽

Data Set ◽

Sample Quantification

The influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from nine Victorian rivers was examined by progressively reducing the degree of detail in the original data (species level, quantitative). Five additional data sets were created that consisted of binary (presence or absence) data on species, quantitative or binary data on families, and quantitative data on PET (plecopteran, ephemeropteran and trichopteran) species or families. Ordinations were performed with detrended correspondence analysis (DCA) and semi-strong hybrid multi-dimensional scaling (SSH). With both ordination techniques, the ordinations of each data set (including the original) revealed the same three underlying gradients. An altitudinal gradient consistently achieved the highest correlations with the ordinations (r = 0.71-0.93), followed by a substratum gradient (r = 0.50-0.88) and a combined pH and conductivity gradient (r = 0.47-0.76). Each of the five less-complete data sets thus provides an adequate degree of detail for ordination analysis and subsequent interpretation of environmental gradients.

Download Full-text

Bayesian Classifier for Sparsity-Promoting Feature Selection

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001415500226 ◽

2015 ◽

Vol 29 (06) ◽

pp. 1550022 ◽

Cited By ~ 1

Author(s):

Danlei Xu ◽

Lan Du ◽

Hongwei Liu ◽

Penghui Wang

Keyword(s):

Feature Selection ◽

Synthetic Data ◽

Original Data ◽

Radar Data ◽

Bayesian Classifier ◽

Classification Model ◽

Data Sets ◽

Data Set ◽

Classification Boundary ◽

Nonlinear Mappings

A Bayesian classifier for sparsity-promoting feature selection is developed in this paper, where a set of nonlinear mappings for the original data is performed as a pre-processing step. The linear classification model with such mappings from the original input space to a nonlinear transformation space can not only construct the nonlinear classification boundary, but also realize the feature selection for the original data. A zero-mean Gaussian prior with Gamma precision and a finite approximation of Beta process prior are used to promote sparsity in the utilization of features and nonlinear mappings in our model, respectively. We derive the Variational Bayesian (VB) inference algorithm for the proposed linear classifier. Experimental results based on the synthetic data set, measured radar data set, high-dimensional gene expression data set, and several benchmark data sets demonstrate the aggressive and robust feature selection capability and comparable classification accuracy of our method comparing with some other existing classifiers.

Download Full-text

Dynamic spatio-temporal generation of large-scale synthetic gridded precipitation: with improved spatial coherence of extremes

Stochastic Environmental Research and Risk Assessment ◽

10.1007/s00477-019-01724-9 ◽

2019 ◽

Vol 34 (9) ◽

pp. 1369-1383 ◽

Cited By ~ 1

Author(s):

Dirk Diederen ◽

Ye Liu

Keyword(s):

Large Scale ◽

Spatial Coherence ◽

Original Data ◽

Return Level ◽

Data Sets ◽

Large Set ◽

Precipitation Data ◽

Data Set ◽

Spatio Temporal ◽

Synthetic Precipitation

Abstract With the ongoing development of distributed hydrological models, flood risk analysis calls for synthetic, gridded precipitation data sets. The availability of large, coherent, gridded re-analysis data sets in combination with the increase in computational power, accommodates the development of new methodology to generate such synthetic data. We tracked moving precipitation fields and classified them using self-organising maps. For each class, we fitted a multivariate mixture model and generated a large set of synthetic, coherent descriptors, which we used to reconstruct moving synthetic precipitation fields. We introduced randomness in the original data set by replacing the observed precipitation fields in the original data set with the synthetic precipitation fields. The output is a continuous, gridded, hourly precipitation data set of a much longer duration, containing physically plausible and spatio-temporally coherent precipitation events. The proposed methodology implicitly provides an important improvement in the spatial coherence of precipitation extremes. We investigate the issue of unrealistic, sudden changes on the grid and demonstrate how a dynamic spatio-temporal generator can provide spatial smoothness in the probability distribution parameters and hence in the return level estimates.

Download Full-text

European household’s income, consumption and wealth

Statistical Journal of the IAOS ◽

10.3233/sji-190528 ◽

2020 ◽

Vol 36 (4) ◽

pp. 1175-1188

Author(s):

Pierre Lamarche ◽

Friderike Oehler ◽

Irene Rioboo

Keyword(s):

Impact Analysis ◽

Well Being ◽

Original Data ◽

Data Sets ◽

The European Union ◽

Micro Data ◽

Data Set ◽

Statistical Matching ◽

Full Picture ◽

Poverty Indicators

Poverty indicators purely based on income statistics do not reflect the full picture of household’s economic well-being. Consumption and wealth are two additional key dimensions that determine the economic opportunities of people or material inequalities. We use non-parametric statistical matching methods to join consumption data from the Household Budget Survey to micro data from the European Union Statistics on Income and Living Conditions. In a second step, micro data from the Household Finance and Consumption Survey are joint to produce a common distribution of income, consumption and wealth variables. A variety of different indicators is then produced based on this joint data set, in particular household saving rates. Care has to be taken when interpreting the indicators, since the statistical matching is based on strong assumptions and a limited number of variables common to all of the three original data sets. We are able to show, however, that the assumptions made are justified by the use of strong proxies as matching variables. Thus, the resulting indicators have the potential to contribute to the analysis of inequality patterns and enhance the possibilities of social, and possibly fiscal, policy impact analysis.

Download Full-text

Diversity of subalpine and alpine vegetation of the eastern part of the Nízke Tatry Mts in Slovakia: major types and environmental gradients

Biologia ◽

10.2478/s11756-009-0176-6 ◽

2009 ◽

Vol 64 (5) ◽

Cited By ~ 6

Author(s):

Lucia Sekulová ◽

Michal Hájek

Keyword(s):

Plant Communities ◽

Environmental Gradients ◽

Ecological Factors ◽

Detrended Correspondence Analysis ◽

Dwarf Shrub ◽

Soil Reaction ◽

Data Set ◽

Entire Data ◽

Second Gradient ◽

One Year

AbstractChanges in composition and structure of alpine and subalpine plant communities in relation to ecological factors were analysed in the Nízke Tatry Mts, Slovakia. Species cover values of vascular and non-vascular plants in each vegetation plot were recorded on the nine-degree scale. A data set of 156 relevés of alpine and subalpine vegetation was sampled recently during one year in the eastern part of the Nízke Tatry National Park. The data set was analysed by cluster analysis and Detrended Correspondence Analysis. analyses were carried out on the entire data set, including the subset of short grassland and dwarf-shrub vegetation. Major gradients and clusters were ecologically interpreted using Ellenberg indicator values. In the entire data set, the major gradient in species composition was associated with nutrient availability and the second most important gradient with light. In the case of short grassland and dwarf-shrub vegetation, the gradients were different. The first one was associated with soil reaction and the second gradient was associated with moisture. Clusters proposed by numerical classification reproduced many traditional phytosociological associations, namely Seslerietum distichae, Sphagno capillifolii-Empetretum nigri, Junco trifidi-Callunetum vulgaris, Juncetum trifidi, Dryopterido dilatatae-Pinetum mugo, Luzuletum obscurae, Agrostio pyrenaiceae-Nardetum strictae, while some other associations were less clearly differentiated (communities of the alliances Calamagrostion villosae, Adenostylion alliariae, Trisetion fusci, Cratoneuro filicini-Calthion laetae or Salicion herbaceae). The next clusters included Vaccinium and Festuca supina dominated communities and artificial roadside grasslands sown 50 years ago. Bryophytes and lichens were highly represented among diagnostic species of particular associations. Distribution pattern of particular plant communities was strongly influenced by site position either on northern or southern slope of the mountains.

Download Full-text

EFFECTS OF NONSINGULAR PREPROCESSING ON FEEDFORWARD NETWORK TRAINING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001405004022 ◽

2005 ◽

Vol 19 (02) ◽

pp. 217-247 ◽

Cited By ~ 6

Author(s):

CHANGHUA YU ◽

MICHAEL T. MANRY ◽

JIANG LI

Keyword(s):

Back Propagation ◽

Original Data ◽

Data Sets ◽

Training Algorithm ◽

Feedforward Network ◽

Data Set ◽

Network Training ◽

The Neural Network ◽

Hidden Layer ◽

Theoretical Analyses

In the neural network literature, many preprocessing techniques, such as feature de-correlation, input unbiasing and normalization, are suggested to accelerate multilayer perceptron training. In this paper, we show that a network trained with an original data set and one trained with a linear transformation of the original data will go through the same training dynamics, as long as they start from equivalent states. Thus preprocessing techniques may not be helpful and are merely equivalent to using a different weight set to initialize the network. Theoretical analyses of such preprocessing approaches are given for conjugate gradient, back propagation and the Newton method. In addition, an efficient Newton-like training algorithm is proposed for hidden layer training. Experiments on various data sets confirm the theoretical analyses and verify the improvement of the new algorithm.

Download Full-text

Responses of benthic macroinvertebrate communities to altitude and geology in tributaries of the Sepik River (Papua New Guinea): the influence of taxonomic resolution on the detection of environmental gradients

Freshwater Biology ◽

10.1111/j.1365-2427.2012.02839.x ◽

2012 ◽

Vol 57 (9) ◽

pp. 1794-1812 ◽

Cited By ~ 16

Author(s):

DAVID DUDGEON

Keyword(s):

Papua New Guinea ◽

Environmental Gradients ◽

New Guinea ◽

Taxonomic Resolution ◽

Benthic Macroinvertebrate ◽

Macroinvertebrate Communities ◽

Sepik River

Download Full-text

A PROBABILISTIC SELF-ORGANIZING MAP FOR BINARY DATA TOPOGRAPHIC CLUSTERING

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026808002351 ◽

2008 ◽

Vol 07 (04) ◽

pp. 363-383 ◽

Cited By ~ 10

Author(s):

MUSTAPHA LEBBAH ◽

YOUNÈS BENNANI ◽

NICOLETA ROGOVSCHI

Keyword(s):

Binary Data ◽

Learning Algorithm ◽

Data Sets ◽

Self Organizing Map ◽

Data Set ◽

Binary Coding ◽

Public Data ◽

Multivariate Binary Data ◽

Self Organizing

This paper introduces a probabilistic self-organizing map for topographic clustering, analysis and visualization of multivariate binary data or categorical data using binary coding. We propose a probabilistic formalism dedicated to binary data in which cells are represented by a Bernoulli distribution. Each cell is characterized by a prototype with the same binary coding as used in the data space and the probability of being different from this prototype. The learning algorithm, Bernoulli on self-organizing map, that we propose is an application of the EM standard algorithm. We illustrate the power of this method with six data sets taken from a public data set repository. The results show a good quality of the topological ordering and homogenous clustering.

Download Full-text

Elastic internal multiple analysis and attenuation using Marchenko and interferometric methods

Geophysics ◽

10.1190/geo2016-0162.1 ◽

2017 ◽

Vol 82 (2) ◽

pp. Q1-Q12 ◽

Cited By ~ 20

Author(s):

Carlos Alberto da Costa Filho ◽

Giovanni Angelo Meles ◽

Andrew Curtis

Keyword(s):

Synthetic Data ◽

Elastic Solid ◽

Original Data ◽

Data Sets ◽

Data Set ◽

Solid Media ◽

Elastic Data ◽

Vertical Density ◽

Internal Multiples ◽

Acoustic Approximation

Conventional seismic processing aims to create data that contain only primary reflections, whereas real seismic recordings also contain multiples. As such, it is desirable to predict, identify, and attenuate multiples in seismic data. This task is more difficult in elastic (solid) media because mode conversions create families of internal multiples not present in the acoustic case. We have developed a method to predict prestack internal multiples in general elastic media based on the Marchenko method and convolutional interferometry. It can be used to identify multiples directly in prestack data or migrated sections, as well as to attenuate internal multiples by adaptively subtracting them from the original data set. We developed the method on two synthetic data sets, the first composed of horizontal density layers and constant velocities, and the second containing horizontal and vertical density and velocity variations. The full-elastic method is computationally expensive and ideally uses data components that are not usually recorded. We therefore tested an acoustic approximation to the method on the synthetic elastic data from the second model and find that although the spatial resolution of the resulting image is reduced by this approximation, it provides images with relatively fewer artifacts. We conclude that in most cases where cost is a factor and we are willing to sacrifice some resolution, it may be sufficient to apply the acoustic version of this demultiple method.

Download Full-text

Privacy-Preserving Stacking with Application to Cross-organizational Diabetes Prediction

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/571 ◽

2019 ◽

Author(s):

Quanming Yao ◽

Xiawei Guo ◽

James Kwok ◽

Weiwei Tu ◽

Yuqiang Chen ◽

...

Keyword(s):

Transfer Learning ◽

Differential Privacy ◽

Original Data ◽

Privacy Preserving ◽

Data Sets ◽

Data Set ◽

Predicting Performance ◽

Empirical Performance ◽

Feature Based ◽

Diabetes Prediction

To meet the standard of differential privacy, noise is usually added into the original data, which inevitably deteriorates the predicting performance of subsequent learning algorithms. In this paper, motivated by the success of improving predicting performance by ensemble learning, we propose to enhance privacy-preserving logistic regression by stacking. We show that this can be done either by sample-based or feature-based partitioning. However, we prove that when privacy-budgets are the same, feature-based partitioning requires fewer samples than sample-based one, and thus likely has better empirical performance. As transfer learning is difficult to be integrated with a differential privacy guarantee, we further combine the proposed method with hypothesis transfer learning to address the problem of learning across different organizations. Finally, we not only demonstrate the effectiveness of our method on two benchmark data sets, i.e., MNIST and NEWS20, but also apply it into a real application of cross-organizational diabetes prediction from RUIJIN data set, where privacy is of a significant concern.

Download Full-text

Grassland communities of Stol mountain (eastern Serbia ): Vegetation and environment al relationships

Archives of Biological Sciences ◽

10.2298/abs1301211a ◽

2013 ◽

Vol 65 (1) ◽

pp. 211-227 ◽

Cited By ~ 3

Author(s):

Svetlana Acic ◽

U. Silc ◽

S. Vrbnicanin ◽

Svjetlana Cupac ◽

G. Topisirovic ◽

...

Keyword(s):

Species Composition ◽

Environmental Gradient ◽

Environmental Gradients ◽

Detrended Correspondence Analysis ◽

Ecological Indicator ◽

Vegetation Types ◽

Indicator Values ◽

Grassland Vegetation ◽

Data Set ◽

Grassland Communities

The systematic survey of grassland communities was performed on Mt. Stol (eastern Serbia). The main aims of the research were to: (1) determine grassland vegetation types of the researched area; (2) correlate the impacts of the soil and environmental conditions on the occurrence of certain plant communities, and (3) comment on the conservational value of the grasslands in the researched area. The data set included 60 phytosociological relev?s of grasslands recorded between 2001 and 2004. The main environmental gradients of species composition were analyzed by Detrended Correspondence Analysis (DCA). For the ecological interpretation of ordination axes, ecological indicator values were used. Three associations were distinguished: Danthonietum calycinae, Asperulo-Agrostietum vulgaris and Ranunculo bulbosi- Arrhenatheretum elatioris belonging to two alliances and two classes: Chrysopogono-Danthonion - Festuco-Brometea and Arrhenatherion - Molinio-Arrhenatheretea. The results of the DCA support our assumption that the main environmental gradient in the species composition of the grasslands is related to nutrients and moisture. The conservational value of grasslands in the researched area is discussed.

Download Full-text