On the benefits of being redundant: low compositional fidelity of diatom death assemblages does not hamper the preservation of environmental gradients in shallow lakes

Paleobiology ◽  
2015 ◽  
Vol 41 (1) ◽  
pp. 154-173 ◽  
Author(s):  
Gabriela S. Hassan

AbstractComparisons between death assemblages and their source living communities are among the most common actualistic methods of evaluating the preservation of compositional and environmental information in fossil assemblages. Although live-dead studies have commonly focused on marine mollusks, the potential of diatoms to preserve ecological information in continental settings has been overlooked. Thus, little is known about the nature and magnitude of the taphonomic biases affecting live-dead agreement of diatom assemblages, despite their extensive application as modern and fossil bioindicators in paleoecological and paleoenvironmental reconstructions. In this study, I analyzed three live-dead data sets in order to evaluate the compositional and environmental fidelity exhibited by diatom death assemblages in shallow lakes. I find that diatom death assemblages (DAs) do differ significantly in their taxonomic composition from living assemblages (LAs), mainly as a consequence of (1) differences in the temporal resolution between time-averaged DAs and non-averaged LAs, and (2) differential preservation of diatom taxa related to the intrinsic properties of their valves. Despite compositional dissimilarities, DAs were able to capture the same environmental gradients as LAs, with high significance. This decoupling between live-dead agreement in community composition and community response to gradients can be related to the existence of at least two mutually exclusive subsets of species that significantly captured compositional dissimilarities based on the full set of the species in the three lakes. This functional redundancy implies that the between-sample relationships of living assemblages can be significantly preserved by DAs even if some taxa are removed by taphonomic processes. The preservation of environmental gradients thus does not require good preservation of all living taxa. Structural redundancy compensates for the loss of compositional fidelity caused by postmortem processes in the diatom data set.

Paleobiology ◽  
2009 ◽  
Vol 35 (1) ◽  
pp. 119-145 ◽  
Author(s):  
Adam Tomašových ◽  
Susan M. Kidwell

Although only a few studies have explicitly evaluated live-dead agreement of species and community responses to environmental and spatial gradients, paleoecological analyses implicitly assume that death assemblages capture these gradients accurately. We use nine data sets from modern, relatively undisturbed coastal study areas to evaluate how the response of living molluscan assemblages to environmental gradients (water depth and seafloor type; “environmental component” of a gradient) and geographic separation (“spatial component”) is captured by their death assemblages. We find that:1. Living assemblages vary in composition either in response to environmental gradients alone (consistent with a species-sorting model) or in response to a combination of environmental and spatial gradients (mass-effect model). None of the living assemblages support the neutral model (or the patch-dynamic model), in which variation in species abundance is related to the spatial configuration of stations alone. These findings also support assumptions that mollusk species consistently differ in responses to environmental gradients, and suggest that in the absence of postmortem bias, environmental gradients might be accurately captured by variation in species composition among death assemblages. Death assemblages do in fact respond uniquely to environmental gradients, and show a stronger response when abundances are square-root transformed to downplay the impact of numerically abundant species and increase the effect of rare species.2. Species' niche positions (position of maximum abundance) along bathymetric and sedimentary gradients in death assemblages show significantly positive rank correlations to species positions in living assemblages in seven of nine data sets (both square-root-transformed and presence-absence data).3. The proportion of compositional variation explained by environmental gradients in death assemblages is similar to that of counterpart living assemblages. Death assemblages thus show the same ability to capture environmental gradients as do living assemblages. In some instances compositional dissimilarities in death assemblages show higher rank correlation with spatial distances than with environmental gradients, but spatial structure in community composition is mainly driven by spatially structured environmental gradients.4. Death assemblages correctly identify the dominance of niche metacommunity models in mollusk communities, as revealed by counterpart living assemblages. This analysis of the environmental resolution of death assemblages thus supports fine-scale niche and paleoenvironmental analyses using molluscan fossil records. In spite of taphonomic processes and time-averaging effects that modify community composition, death assemblages largely capture the response of living communities to environmental gradients, partly because of redundancy in community structure that is inherently associated with multispecies assemblages. The molluscan data sets show some degree of redundancy as evidenced by the presence of at least two mutually exclusive subsets of species that replicate the community structure, and simple simulations show that between-sample relationships can be preserved and remain significant even when a large proportion of species is randomly removed from data sets.


1992 ◽  
Vol 49 (S1) ◽  
pp. 40-51 ◽  
Author(s):  
K. H. Nicholls ◽  
L. Nakamoto ◽  
W. Keller

Phytoplankton data from samples collected in July 1981 from 111 lakes in the Sudbury area were analyzed by canonical community ordination analysis and other techniques to reveal associations of taxa which were related to environmental gradients among the lakes. Phytoplankton data from a group of seven lakes sampled in the mid-1970's or early 1980's and in the mid-1980's were analyzed for evidence of temporal change. In both data sets, factors related to trophic status and acidification status were inferred to be important controlling variables. For example, desmid genera fell into two general groupings, one typical of clear-water, nutrient-poor, low-alkalinity lakes (e.g. species of Micrasterias, Bambusina, Euastrum, Spondylosium) and the other representing lakes higher in nutrients and alkalinity (e.g. species of Teilingia, Closterium, Xanthidium, Staurastrum). This latter group also included several chlorophytes (Ulothrix, Schroederia, Scenedesmus) and euglenoids (Euglena, Phacus). Well-defined relationships existed between lake alkalinity, pH, and total numbers of phytoplankton taxa. The smaller data set included lakes subjected to recent decreases in acid deposition and corresponding increases in pH over a 10-yr interval, and the increased numbers of phytoplankton taxa were indicative of recovery from earlier acidification.


1995 ◽  
Vol 46 (2) ◽  
pp. 501 ◽  
Author(s):  
R Marchant ◽  
LA Barmuta ◽  
BC Chessman

The influence of sample quantification and taxonomic resolution on the ordination of macroinvertebrate communities from nine Victorian rivers was examined by progressively reducing the degree of detail in the original data (species level, quantitative). Five additional data sets were created that consisted of binary (presence or absence) data on species, quantitative or binary data on families, and quantitative data on PET (plecopteran, ephemeropteran and trichopteran) species or families. Ordinations were performed with detrended correspondence analysis (DCA) and semi-strong hybrid multi-dimensional scaling (SSH). With both ordination techniques, the ordinations of each data set (including the original) revealed the same three underlying gradients. An altitudinal gradient consistently achieved the highest correlations with the ordinations (r = 0.71-0.93), followed by a substratum gradient (r = 0.50-0.88) and a combined pH and conductivity gradient (r = 0.47-0.76). Each of the five less-complete data sets thus provides an adequate degree of detail for ordination analysis and subsequent interpretation of environmental gradients.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2018 ◽  
Vol 21 (2) ◽  
pp. 117-124 ◽  
Author(s):  
Bakhtyar Sepehri ◽  
Nematollah Omidikia ◽  
Mohsen Kompany-Zareh ◽  
Raouf Ghavami

Aims & Scope: In this research, 8 variable selection approaches were used to investigate the effect of variable selection on the predictive power and stability of CoMFA models. Materials & Methods: Three data sets including 36 EPAC antagonists, 79 CD38 inhibitors and 57 ATAD2 bromodomain inhibitors were modelled by CoMFA. First of all, for all three data sets, CoMFA models with all CoMFA descriptors were created then by applying each variable selection method a new CoMFA model was developed so for each data set, 9 CoMFA models were built. Obtained results show noisy and uninformative variables affect CoMFA results. Based on created models, applying 5 variable selection approaches including FFD, SRD-FFD, IVE-PLS, SRD-UVEPLS and SPA-jackknife increases the predictive power and stability of CoMFA models significantly. Result & Conclusion: Among them, SPA-jackknife removes most of the variables while FFD retains most of them. FFD and IVE-PLS are time consuming process while SRD-FFD and SRD-UVE-PLS run need to few seconds. Also applying FFD, SRD-FFD, IVE-PLS, SRD-UVE-PLS protect CoMFA countor maps information for both fields.


Author(s):  
Kyungkoo Jun

Background & Objective: This paper proposes a Fourier transform inspired method to classify human activities from time series sensor data. Methods: Our method begins by decomposing 1D input signal into 2D patterns, which is motivated by the Fourier conversion. The decomposition is helped by Long Short-Term Memory (LSTM) which captures the temporal dependency from the signal and then produces encoded sequences. The sequences, once arranged into the 2D array, can represent the fingerprints of the signals. The benefit of such transformation is that we can exploit the recent advances of the deep learning models for the image classification such as Convolutional Neural Network (CNN). Results: The proposed model, as a result, is the combination of LSTM and CNN. We evaluate the model over two data sets. For the first data set, which is more standardized than the other, our model outperforms previous works or at least equal. In the case of the second data set, we devise the schemes to generate training and testing data by changing the parameters of the window size, the sliding size, and the labeling scheme. Conclusion: The evaluation results show that the accuracy is over 95% for some cases. We also analyze the effect of the parameters on the performance.


2019 ◽  
Vol 73 (8) ◽  
pp. 893-901
Author(s):  
Sinead J. Barton ◽  
Bryan M. Hennelly

Cosmic ray artifacts may be present in all photo-electric readout systems. In spectroscopy, they present as random unidirectional sharp spikes that distort spectra and may have an affect on post-processing, possibly affecting the results of multivariate statistical classification. A number of methods have previously been proposed to remove cosmic ray artifacts from spectra but the goal of removing the artifacts while making no other change to the underlying spectrum is challenging. One of the most successful and commonly applied methods for the removal of comic ray artifacts involves the capture of two sequential spectra that are compared in order to identify spikes. The disadvantage of this approach is that at least two recordings are necessary, which may be problematic for dynamically changing spectra, and which can reduce the signal-to-noise (S/N) ratio when compared with a single recording of equivalent duration due to the inclusion of two instances of read noise. In this paper, a cosmic ray artefact removal algorithm is proposed that works in a similar way to the double acquisition method but requires only a single capture, so long as a data set of similar spectra is available. The method employs normalized covariance in order to identify a similar spectrum in the data set, from which a direct comparison reveals the presence of cosmic ray artifacts, which are then replaced with the corresponding values from the matching spectrum. The advantage of the proposed method over the double acquisition method is investigated in the context of the S/N ratio and is applied to various data sets of Raman spectra recorded from biological cells.


2013 ◽  
Vol 756-759 ◽  
pp. 3652-3658
Author(s):  
You Li Lu ◽  
Jun Luo

Under the study of Kernel Methods, this paper put forward two improved algorithm which called R-SVM & I-SVDD in order to cope with the imbalanced data sets in closed systems. R-SVM used K-means algorithm clustering space samples while I-SVDD improved the performance of original SVDD by imbalanced sample training. Experiment of two sets of system call data set shows that these two algorithms are more effectively and R-SVM has a lower complexity.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yahya Albalawi ◽  
Jim Buckley ◽  
Nikola S. Nikolov

AbstractThis paper presents a comprehensive evaluation of data pre-processing and word embedding techniques in the context of Arabic document classification in the domain of health-related communication on social media. We evaluate 26 text pre-processings applied to Arabic tweets within the process of training a classifier to identify health-related tweets. For this task we use the (traditional) machine learning classifiers KNN, SVM, Multinomial NB and Logistic Regression. Furthermore, we report experimental results with the deep learning architectures BLSTM and CNN for the same text classification problem. Since word embeddings are more typically used as the input layer in deep networks, in the deep learning experiments we evaluate several state-of-the-art pre-trained word embeddings with the same text pre-processing applied. To achieve these goals, we use two data sets: one for both training and testing, and another for testing the generality of our models only. Our results point to the conclusion that only four out of the 26 pre-processings improve the classification accuracy significantly. For the first data set of Arabic tweets, we found that Mazajak CBOW pre-trained word embeddings as the input to a BLSTM deep network led to the most accurate classifier with F1 score of 89.7%. For the second data set, Mazajak Skip-Gram pre-trained word embeddings as the input to BLSTM led to the most accurate model with F1 score of 75.2% and accuracy of 90.7% compared to F1 score of 90.8% achieved by Mazajak CBOW for the same architecture but with lower accuracy of 70.89%. Our results also show that the performance of the best of the traditional classifier we trained is comparable to the deep learning methods on the first dataset, but significantly worse on the second dataset.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 218-219
Author(s):  
Andres Fernando T Russi ◽  
Mike D Tokach ◽  
Jason C Woodworth ◽  
Joel M DeRouchey ◽  
Robert D Goodband ◽  
...  

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.


Sign in / Sign up

Export Citation Format

Share Document