Characterising sediments of a tropical sediment-starved shelf using cluster analysis of physical and geochemical variables

Lynda C. Radke; Jin Li; Grant Douglas; Rachel Przeslawski; Scott Nichol; Justy Siwabessy; Zhi Huang; Janice Trafford; Tony Watson; Tanya Whiteway

doi:10.1071/en14126

Characterising sediments of a tropical sediment-starved shelf using cluster analysis of physical and geochemical variables

Environmental Chemistry ◽

10.1071/en14126 ◽

2015 ◽

Vol 12 (2) ◽

pp. 204 ◽

Cited By ~ 7

Author(s):

Lynda C. Radke ◽

Jin Li ◽

Grant Douglas ◽

Rachel Przeslawski ◽

Scott Nichol ◽

...

Keyword(s):

Large Data ◽

High Energy ◽

Rate Coefficients ◽

Geochemical Data ◽

Clustering Methods ◽

Data Set ◽

Β Diversity ◽

Published Evidence ◽

Baseline Information ◽

Geochemical Variables

Environmental context Australia's tropical marine estate is a biodiversity hotspot that is threatened by human activities. Analysis and interpretation of large physical and geochemistry data sets provides important information on processes occurring at the seafloor in this poorly known area. These processes help us to understand how the seafloor functions to support biodiversity in the region. Abstract Baseline information on habitats is required to manage Australia's northern tropical marine estate. This study aims to develop an improved understanding of seafloor environments of the Timor Sea. Clustering methods were applied to a large data set comprising physical and geochemical variables that describe organic matter (OM) reactivity, quantity and source, and geochemical processes. Arthropoda (infauna) were used to assess different groupings. Clusters based on physical and geochemical data discriminated arthropods better than geomorphic features. Major variations among clusters included grain size and a cross-shelf transition from authigenic-Mn–As enrichments (inner shelf) to authigenic-P enrichment (outer shelf). Groups comprising raised features had the highest reactive OM concentrations (e.g. low chlorin indices and C:N ratios, and high reaction rate coefficients) and benthic algal δ13C signatures. Surface area-normalised OM concentrations higher than continental shelf norms were observed in association with: (i) low δ15N, inferring Trichodesmium input; and (ii) pockmarks, which impart bottom–up controls on seabed chemistry and cause inconsistencies between bulk and pigment OM pools. Low Shannon–Wiener diversity occurred in association with low redox and porewater pH and published evidence for high energy. Highest β-diversity was observed at euphotic depths. Geochemical data and clustering methods used here provide insight into ecosystem processes that likely influence biodiversity patterns in the region.

Download Full-text

Rapid and Accurate Analysis of an X-Ray Fluorescence Microscopy Data Set through Gaussian Mixture-Based Soft Clustering Methods

Microscopy and Microanalysis ◽

10.1017/s1431927613012737 ◽

2013 ◽

Vol 19 (5) ◽

pp. 1281-1289 ◽

Cited By ~ 11

Author(s):

Jesse Ward ◽

Rebecca Marvin ◽

Thomas O'Halloran ◽

Chris Jacobsen ◽

Stefan Vogt

Keyword(s):

Simultaneous Detection ◽

Large Data ◽

Gaussian Mixture ◽

Subcellular Fractionation ◽

Food Vacuole ◽

Clustering Methods ◽

Accurate Analysis ◽

Data Set ◽

X Ray ◽

Soft Clustering

AbstractX-ray fluorescence (XRF) microscopy is an important tool for studying trace metals in biology, enabling simultaneous detection of multiple elements of interest and allowing quantification of metals in organelles without the need for subcellular fractionation. Currently, analysis of XRF images is often done using manually defined regions of interest (ROIs). However, since advances in synchrotron instrumentation have enabled the collection of very large data sets encompassing hundreds of cells, manual approaches are becoming increasingly impractical. We describe here the use of soft clustering to identify cell ROIs based on elemental contents, using data collected over a sample of the malaria parasite Plasmodium falciparum as a test case. Soft clustering was able to successfully classify regions in infected erythrocytes as “parasite,” “food vacuole,” “host,” or “background.” In contrast, hard clustering using the k-means algorithm was found to have difficulty in distinguishing cells from background. While initial tests showed convergence on two or three distinct solutions in 60% of the cells studied, subsequent modifications to the clustering routine improved results to yield 100% consistency in image segmentation. Data extracted using soft cluster ROIs were found to be as accurate as data extracted using manually defined ROIs, and analysis time was considerably improved.

Download Full-text

NEUTRINO ASTRONOMY WITH ICECUBE

Modern Physics Letters A ◽

10.1142/s0217732309031417 ◽

2009 ◽

Vol 24 (20) ◽

pp. 1543-1557 ◽

Cited By ~ 9

Author(s):

TYCE DeYOUNG

Keyword(s):

Large Data ◽

High Energy ◽

Neutrino Telescope ◽

Atmospheric Neutrinos ◽

South Pole ◽

Data Set ◽

Fundamental Physics ◽

Astrophysical Objects ◽

Neutrino Astronomy ◽

Under Construction

IceCube is a kilometer-scale high energy neutrino telescope under construction at the South Pole, a second-generation instrument expanding the capabilities of the AMANDA telescope. The scientific portfolio of IceCube includes the detection of neutrinos from astrophysical objects such as the sources of the cosmic rays, the search for dark matter, and fundamental physics using a very large data set of atmospheric neutrinos. The design and status of IceCube are briefly reviewed, followed by a summary of results to date from AMANDA and initial IceCube results from the 2007 run, with 22 of a planned 86 strings operational. The new infill array known as Deep Core, which will extend IceCube's capabilities to energies as low as 10 GeV, is also described.

Download Full-text

Using anticlustering to partition data sets into equivalent parts

10.31234/osf.io/3razc ◽

2019 ◽

Author(s):

Martin Papenberg ◽

Gunnar W. Klau

Keyword(s):

Cross Validation ◽

Item Difficulty ◽

Large Data ◽

Real Data ◽

Psychological Research ◽

Data Sets ◽

Clustering Methods ◽

Data Set ◽

R Programming Language ◽

R Programming

Numerous applications in psychological research require that a pool of elements is partitioned into multiple parts. While many applications seek groups that are well-separated, i.e., dissimilar from each other, others require the different groups to be as similar as possible. Examples include the assignment of students to parallel courses, assembling stimulus sets in experimental psychology, splitting achievement tests into parts of equal difficulty, and dividing a data set for cross validation. We present anticlust, an easy-to-use and free software package for solving these problems fast and in an automated manner. The package anticlust is an open source extension to the R programming language and implements the methodology of anticlustering. Anticlustering divides elements into similar parts, ensuring similarity between groups by enforcing heterogeneity within groups. Thus, anticlustering is the direct reversal of cluster analysis that aims to maximize homogeneity within groups and dissimilarity between groups. Our package anticlust implements two anticlustering criteria, reversing the clustering methods k-means and cluster editing, respectively. In a simulation study, we show that anticlustering returns excellent results and outperforms alternative approaches like random assignment and matching. In three example applications, we illustrate how to apply anticlust on real data sets. We demonstrate how to assign experimental stimuli to equivalent sets based on norming data, how to divide a large data set for cross validation, and how to split a test into parts of equal item difficulty and discrimination.

Download Full-text

Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2007.02.020 ◽

2007 ◽

Vol 52 (1) ◽

pp. 596-613 ◽

Cited By ~ 41

Author(s):

Marie Plasse ◽

Ndeye Niang ◽

Gilbert Saporta ◽

Alexandre Villeminot ◽

Laurent Leblond

Keyword(s):

Association Rules ◽

Large Data ◽

Clustering Methods ◽

Data Set ◽

Association Rules Mining ◽

Large Data Set ◽

Combined Use

Download Full-text

Big Data Clustering And Its Applications Examination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1466.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3687-3693

Keyword(s):

Data Mining ◽

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Data Sets ◽

Clustering Methods ◽

Time Saving ◽

Data Set ◽

The Many

Clustering is a type of mining process where the data set is categorized into various sub classes. Clustering process is very much essential in classification, grouping, and exploratory pattern of analysis, image segmentation and decision making. And we can explain about the big data as very large data sets which are examined computationally to show techniques and associations and also which is associated to the human behavior and their interactions. Big data is very essential for several organisations but in few cases very complex to store and it is also time saving. Hence one of the ways of overcoming these issues is to develop the many clustering methods, moreover it suffers from the large complexity. Data mining is a type of technique where the useful information is extracted, but the data mining models cannot utilized for the big data because of inherent complexity. The main scope here is to introducing a overview of data clustering divisions for the big data And also explains here few of the related work for it. This survey concentrates on the research of several clustering algorithms which are working basically on the elements of big data. And also the short overview of clustering algorithms which are grouped under partitioning, hierarchical, grid based and model based are seenClustering is major data mining and it is used for analyzing the big data.the problems for applying clustering patterns to big data and also we phase new issues come up with big data

Download Full-text

H-migration in peroxy radicals under atmospheric conditions

10.5194/acp-2019-1164 ◽

2020 ◽

Author(s):

Luc Vereecken ◽

Barbara Nozière

Keyword(s):

Theoretical Data ◽

Large Data ◽

Kinetic Studies ◽

Theoretical Work ◽

Rate Coefficients ◽

Peroxy Radicals ◽

Atmospheric Conditions ◽

Hydrogen Atoms ◽

Data Set ◽

Wide Range

Abstract. A large data set of rate coefficients for H-migration in peroxy radicals is presented, and supplemented with literature data to derive a structure-activity relationship (SAR), for the title reaction class. The SAR supports aliphatic RO2 radicals, as well as unsaturated bonds and β-oxo substitution both endo- and exo-cyclic to the transition state ring, and α‑oxo (aldehyde), −OH, −OOH, and −ONO2 substitutions, including migration of O-based hydrogen atoms. Also discussed are −C(= O)OH and −OR substitutions. The SAR allows predictions of rate coefficients k(T) for a temperature range of 200 to 450 K, with migrations spans ranging from 1,4 to 1,9-H-shifts depending on the functionalities. The performance of the SAR reflects the uncertainty of the underlying data, reproducing the scarce experimental data on average to a factor of 2, and the wide range of theoretical data to a factor 10 to 100 depending also on the quality of the data. The SAR evaluation discusses the performance in multi-functionalized species. For aliphatic RO2, we also present some experimental product identification that validates the expected mechanisms. The proposed SAR is a valuable tool for mechanism development, experimental design, and guides future theoretical work, which should allow rapid improvements of the SAR in the following years; relative multi-conformer TST (rel-MC-TST) kinetic theory is introduced as an aid for systematic kinetic studies.

Download Full-text

Some statistical and CI models to predict chaotic high-frequency financial data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189107 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6419-6430

Author(s):

Dusan Marcek

Keyword(s):

Time Series Data ◽

Moving Average ◽

Methodological Approach ◽

Back Propagation ◽

Large Data ◽

Series Data ◽

Data Set ◽

Training Time ◽

Optimal Population ◽

Forecast Time

To forecast time series data, two methodological frameworks of statistical and computational intelligence modelling are considered. The statistical methodological approach is based on the theory of invertible ARIMA (Auto-Regressive Integrated Moving Average) models with Maximum Likelihood (ML) estimating method. As a competitive tool to statistical forecasting models, we use the popular classic neural network (NN) of perceptron type. To train NN, the Back-Propagation (BP) algorithm and heuristics like genetic and micro-genetic algorithm (GA and MGA) are implemented on the large data set. A comparative analysis of selected learning methods is performed and evaluated. From performed experiments we find that the optimal population size will likely be 20 with the lowest training time from all NN trained by the evolutionary algorithms, while the prediction accuracy level is lesser, but still acceptable by managers.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Correlation between the structure and skin permeability of compounds

Scientific Reports ◽

10.1038/s41598-021-89587-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ruolan Zeng ◽

Jiyong Deng ◽

Limin Dang ◽

Xinliang Yu

Keyword(s):

Large Data ◽

Qsar Model ◽

Coefficient Of Determination ◽

Support Vector ◽

Skin Permeability ◽

Data Set ◽

Test Set ◽

Svm Algorithm ◽

Svm Model ◽

Toxicity Relationship

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Download Full-text

Galaxy spin direction distribution in HST and SDSS show similar large-scale asymmetry

Publications of the Astronomical Society of Australia ◽

10.1017/pasa.2020.46 ◽

2020 ◽

Vol 37 ◽

Author(s):

Lior Shamir

Keyword(s):

Large Scale ◽

Spiral Galaxies ◽

Hubble Space Telescope ◽

Gravitational Interaction ◽

Large Data ◽

Sloan Digital Sky Survey ◽

Data Sets ◽

Dipole Axis ◽

Data Set ◽

The Asymmetry

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .

Download Full-text