scholarly journals Assessing sampling sufficiency of network metrics using bootstrap

2016 ◽  
Author(s):  
Grasiela Casas ◽  
Vinicius A.G. Bastazini ◽  
Vanderlei J. Debastiani ◽  
Valério D. Pillar

AbstractSampling the full diversity of interactions in an ecological community is a highly intensive effort. Recent studies have demonstrated that many network metrics are sensitive to both sampling effort and network size. Here, we develop a statistical framework, based on bootstrap resampling, that aims to assess sampling sufficiency for some of the most widely used metrics in network ecology, namely connectance, nestedness (NODF-nested overlap and decreasing fill) and modularity (using the QuaBiMo algorithm). Our framework can generate confidence intervals for each network metric with increasing sample size (i.e., the number of sampled interaction events, or number of sampled individuals), which can be used to evaluate sampling sufficiency. The sample is considered sufficient when the confidence limits reach stability or lie within an acceptable level of precision for the aims of the study. We illustrate our framework with data from three quantitative networks of plant and frugivorous birds, varying in size from 16 to 115 species, and 17 to 2,745 interactions. These data sets illustrate that, for the same dataset, sampling sufficiency may be reached at different sample sizes depending on the metric of interest. The bootstrap confidence limits reached stability for the two largest networks, but were wide and unstable with increasing sample size for all three metrics estimated based on the smallest network. The bootstrap method is useful to empirical ecologists to indicate the minimum number of interactions necessary to reach sampling sufficiency for a specific network metric. It is also useful to compare sampling techniques of networks in their capacity to reach sampling sufficiency. Our method is general enough to be applied to different types of metrics and networks.

2020 ◽  
Vol 12 (9) ◽  
pp. 1664-1678 ◽  
Author(s):  
Alicia S Arroyo ◽  
Romain Iannes ◽  
Eric Bapteste ◽  
Iñaki Ruiz-Trillo

Abstract The Holozoa clade comprises animals and several unicellular lineages (choanoflagellates, filastereans, and teretosporeans). Understanding their full diversity is essential to address the origins of animals and other evolutionary questions. However, they are poorly known. To provide more insights into the real diversity of holozoans and check for undiscovered diversity, we here analyzed 18S rDNA metabarcoding data from the global Tara Oceans expedition. To overcome the low phylogenetic information contained in the metabarcoding data set (composed of sequences from the short V9 region of the gene), we used similarity networks by combining two data sets: unknown environmental sequences from Tara Oceans and known reference sequences from GenBank. We then calculated network metrics to compare environmental sequences with reference sequences. These metrics reflected the divergence between both types of sequences and provided an effective way to search for evolutionary relevant diversity, further validated by phylogenetic placements. Our results showed that the percentage of unicellular holozoan diversity remains hidden. We found novelties in several lineages, especially in Acanthoecida choanoflagellates. We also identified a potential new holozoan group that could not be assigned to any of the described extant clades. Data on geographical distribution showed that, although ubiquitous, each unicellular holozoan lineage exhibits a different distribution pattern. We also identified a positive association between new animal hosts and the ichthyosporean symbiont Creolimax fragrantissima, as well as for other holozoans previously reported as free-living. Overall, our analyses provide a fresh perspective into the diversity and ecology of unicellular holozoans, highlighting the amount of undescribed diversity.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


2021 ◽  
Vol 99 (Supplement_1) ◽  
pp. 218-219
Author(s):  
Andres Fernando T Russi ◽  
Mike D Tokach ◽  
Jason C Woodworth ◽  
Joel M DeRouchey ◽  
Robert D Goodband ◽  
...  

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.


2019 ◽  
Vol 11 (3) ◽  
pp. 168781401983684 ◽  
Author(s):  
Leilei Cao ◽  
Lulu Cao ◽  
Lei Guo ◽  
Kui Liu ◽  
Xin Ding

It is difficult to have enough samples to implement the full-scale life test on the loader drive axle due to high cost. But the extreme small sample size can hardly meet the statistical requirements of the traditional reliability analysis methods. In this work, the method of combining virtual sample expanding with Bootstrap is proposed to evaluate the fatigue reliability of the loader drive axle with extreme small sample. First, the sample size is expanded by virtual augmentation method to meet the requirement of Bootstrap method. Then, a modified Bootstrap method is used to evaluate the fatigue reliability of the expanded sample. Finally, the feasibility and reliability of the method are verified by comparing the results with the semi-empirical estimation method. Moreover, from the practical perspective, the promising result from this study indicates that the proposed method is more efficient than the semi-empirical method. The proposed method provides a new way for the reliability evaluation of costly and complex structures.


2017 ◽  
Vol 6 (4) ◽  
pp. 113
Author(s):  
Esin Yilmaz Kogar ◽  
Hülya Kelecioglu

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.


Author(s):  
Ricardo Scrosati

This study investigated the synchrony of frond dynamics among patches of the intertidal seaweed Mazzaella parksii (=M. cornucopiae; Rhodophyta: Gigartinales) at local spatial scale. At Prasiola Point (Pacific coast of Canada), the mean synchrony of the seasonal changes in frond density among seven permanent, 100-cm2 quadrats was significant (mean Pearson's r=0·73, with 0·65–0·81 as 95% confidence limits) between 1993 and 1995. This indicates that predicting seasonal trends for non-monitored patches at local spatial scale can be done relatively well based on observations on a limited number of quadrats. The identification of the spatial scales at which seaweed populations covary synchronously will permit minimizing sampling effort while retaining the ability to make valid predictions for non-monitored sites.


2020 ◽  
Vol 16 (3) ◽  
pp. 1061-1074 ◽  
Author(s):  
Jörg Franke ◽  
Veronika Valler ◽  
Stefan Brönnimann ◽  
Raphael Neukom ◽  
Fernando Jaume-Santero

Abstract. Differences between paleoclimatic reconstructions are caused by two factors: the method and the input data. While many studies compare methods, we will focus in this study on the consequences of the input data choice in a state-of-the-art Kalman-filter paleoclimate data assimilation approach. We evaluate reconstruction quality in the 20th century based on three collections of tree-ring records: (1) 54 of the best temperature-sensitive tree-ring chronologies chosen by experts; (2) 415 temperature-sensitive tree-ring records chosen less strictly by regional working groups and statistical screening; (3) 2287 tree-ring series that are not screened for climate sensitivity. The three data sets cover the range from small sample size, small spatial coverage and strict screening for temperature sensitivity to large sample size and spatial coverage but no screening. Additionally, we explore a combination of these data sets plus screening methods to improve the reconstruction quality. A large, unscreened collection generally leads to a poor reconstruction skill. A small expert selection of extratropical Northern Hemisphere records allows for a skillful high-latitude temperature reconstruction but cannot be expected to provide information for other regions and other variables. We achieve the best reconstruction skill across all variables and regions by combining all available input data but rejecting records with insignificant climatic information (p value of regression model >0.05) and removing duplicate records. It is important to use a tree-ring proxy system model that includes both major growth limitations, temperature and moisture.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


2003 ◽  
Vol 27 (3) ◽  
pp. 160-163
Author(s):  
Jennifer H. Myszewski ◽  
Floyd E. Bridgwater ◽  
Thomas D. Byram

Abstract Two important questions for clonal forestry are: (1) how many ortets must be established to ensure that one or more of the best genotypes in a family will be available for field tests and plantation establishment; and (2) how certain can one be that at least one top genotype will be present in a sample of n ortets. In this study, we calculated the level of confidence (LOC) in having included one or more desirable, rootable genotypes in a random sample of n ortets from a full-sibling family. We also calculated the number of unique ortets required to achieve a given LOC in having included one or more desirable, rootable genotypes in a sample. In general, when the sample size is small, either because the original number of ortets was low or because of poor rootability, the LOC is lower. When rootability is low or when only a small percentage of the possible genotypes is considered desirable, the original number of ortets required to achieve a given LOC is higher. Both LOC and sample size are highly influenced by the target number of desirable genotypes to be captured in a sample of ortets. South. J. Appl. For. 27(3):160–163.


Sign in / Sign up

Export Citation Format

Share Document