Assessing sampling sufficiency of network metrics using bootstrap

Mapping Intimacies ◽

10.1101/080655 ◽

2016 ◽

Author(s):

Grasiela Casas ◽

Vinicius A.G. Bastazini ◽

Vanderlei J. Debastiani ◽

Valério D. Pillar

Keyword(s):

Sample Size ◽

Bootstrap Method ◽

Network Size ◽

Sampling Effort ◽

Data Sets ◽

Confidence Limits ◽

Frugivorous Birds ◽

Full Diversity ◽

Minimum Number ◽

Network Metrics

AbstractSampling the full diversity of interactions in an ecological community is a highly intensive effort. Recent studies have demonstrated that many network metrics are sensitive to both sampling effort and network size. Here, we develop a statistical framework, based on bootstrap resampling, that aims to assess sampling sufficiency for some of the most widely used metrics in network ecology, namely connectance, nestedness (NODF-nested overlap and decreasing fill) and modularity (using the QuaBiMo algorithm). Our framework can generate confidence intervals for each network metric with increasing sample size (i.e., the number of sampled interaction events, or number of sampled individuals), which can be used to evaluate sampling sufficiency. The sample is considered sufficient when the confidence limits reach stability or lie within an acceptable level of precision for the aims of the study. We illustrate our framework with data from three quantitative networks of plant and frugivorous birds, varying in size from 16 to 115 species, and 17 to 2,745 interactions. These data sets illustrate that, for the same dataset, sampling sufficiency may be reached at different sample sizes depending on the metric of interest. The bootstrap confidence limits reached stability for the two largest networks, but were wide and unstable with increasing sample size for all three metrics estimated based on the smallest network. The bootstrap method is useful to empirical ecologists to indicate the minimum number of interactions necessary to reach sampling sufficiency for a specific network metric. It is also useful to compare sampling techniques of networks in their capacity to reach sampling sufficiency. Our method is general enough to be applied to different types of metrics and networks.

Download Full-text

Gene Similarity Networks Unveil a Potential Novel Unicellular Group Closely Related to Animals from the Tara Oceans Expedition

Genome Biology and Evolution ◽

10.1093/gbe/evaa117 ◽

2020 ◽

Vol 12 (9) ◽

pp. 1664-1678 ◽

Cited By ~ 1

Author(s):

Alicia S Arroyo ◽

Romain Iannes ◽

Eric Bapteste ◽

Iñaki Ruiz-Trillo

Keyword(s):

Positive Association ◽

Data Sets ◽

Data Set ◽

Full Diversity ◽

Fresh Perspective ◽

Reference Sequences ◽

Similarity Networks ◽

Environmental Sequences ◽

Network Metrics ◽

Gene Similarity

Abstract The Holozoa clade comprises animals and several unicellular lineages (choanoflagellates, filastereans, and teretosporeans). Understanding their full diversity is essential to address the origins of animals and other evolutionary questions. However, they are poorly known. To provide more insights into the real diversity of holozoans and check for undiscovered diversity, we here analyzed 18S rDNA metabarcoding data from the global Tara Oceans expedition. To overcome the low phylogenetic information contained in the metabarcoding data set (composed of sequences from the short V9 region of the gene), we used similarity networks by combining two data sets: unknown environmental sequences from Tara Oceans and known reference sequences from GenBank. We then calculated network metrics to compare environmental sequences with reference sequences. These metrics reflected the divergence between both types of sequences and provided an effective way to search for evolutionary relevant diversity, further validated by phylogenetic placements. Our results showed that the percentage of unicellular holozoan diversity remains hidden. We found novelties in several lineages, especially in Acanthoecida choanoflagellates. We also identified a potential new holozoan group that could not be assigned to any of the described extant clades. Data on geographical distribution showed that, although ubiquitous, each unicellular holozoan lineage exhibits a different distribution pattern. We also identified a positive association between new animal hosts and the ichthyosporean symbiont Creolimax fragrantissima, as well as for other holozoans previously reported as free-living. Overall, our analyses provide a fresh perspective into the diversity and ecology of unicellular holozoans, highlighting the amount of undescribed diversity.

Download Full-text

The social wasp Vespula germanica (Fabricius) (Hymenoptera: Vespidae) population dynamics in England over 39 years.

The Entomologist s monthly magazine ◽

10.31184/m00138908.1542.3906 ◽

2018 ◽

Vol 154 (2) ◽

pp. 149-155

Author(s):

Michael Archer

Keyword(s):

Population Dynamics ◽

Population Dynamic ◽

Ecological Factors ◽

Social Wasp ◽

Data Sets ◽

Data Set ◽

Vespula Germanica ◽

The Social ◽

Minimum Number ◽

Suction Traps

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.

Download Full-text

PSVI-8 Meta-regression Analysis to Determine the Relationship Between Growing Pig Body Weight and Variation

Journal of Animal Science ◽

10.1093/jas/skab054.357 ◽

2021 ◽

Vol 99 (Supplement_1) ◽

pp. 218-219

Author(s):

Andres Fernando T Russi ◽

Mike D Tokach ◽

Jason C Woodworth ◽

Joel M DeRouchey ◽

Robert D Goodband ◽

...

Keyword(s):

Body Weight ◽

Regression Analysis ◽

Sample Size ◽

Polynomial Regression ◽

Data Sets ◽

Regression Equations ◽

Prediction Equations ◽

Data Set ◽

Rate Of Increase ◽

The Relationship

Abstract The swine industry has been constantly evolving to select animals with improved performance traits and to minimize variation in body weight (BW) in order to meet packer specifications. Therefore, understanding variation presents an opportunity for producers to find strategies that could help reduce, manage, or deal with variation of pigs in a barn. A systematic review and meta-analysis was conducted by collecting data from multiple studies and available data sets in order to develop prediction equations for coefficient of variation (CV) and standard deviation (SD) as a function of BW. Information regarding BW variation from 16 papers was recorded to provide approximately 204 data points. Together, these data included 117,268 individually weighed pigs with a sample size that ranged from 104 to 4,108 pigs. A random-effects model with study used as a random effect was developed. Observations were weighted using sample size as an estimate for precision on the analysis, where larger data sets accounted for increased accuracy in the model. Regression equations were developed using the nlme package of R to determine the relationship between BW and its variation. Polynomial regression analysis was conducted separately for each variation measurement. When CV was reported in the data set, SD was calculated and vice versa. The resulting prediction equations were: CV (%) = 20.04 – 0.135 × (BW) + 0.00043 × (BW)2, R2=0.79; SD = 0.41 + 0.150 × (BW) - 0.00041 × (BW)2, R2 = 0.95. These equations suggest that there is evidence for a decreasing quadratic relationship between mean CV of a population and BW of pigs whereby the rate of decrease is smaller as mean pig BW increases from birth to market. Conversely, the rate of increase of SD of a population of pigs is smaller as mean pig BW increases from birth to market.

Download Full-text

Reliability estimation for drive axle of wheel loader under extreme small sample

Advances in Mechanical Engineering ◽

10.1177/1687814019836849 ◽

2019 ◽

Vol 11 (3) ◽

pp. 168781401983684 ◽

Cited By ~ 1

Author(s):

Leilei Cao ◽

Lulu Cao ◽

Lei Guo ◽

Kui Liu ◽

Xin Ding

Keyword(s):

Sample Size ◽

Promising Result ◽

Small Sample Size ◽

Bootstrap Method ◽

Estimation Method ◽

Small Sample ◽

Reliability Estimation ◽

Fatigue Reliability ◽

Drive Axle ◽

Semi Empirical

It is difficult to have enough samples to implement the full-scale life test on the loader drive axle due to high cost. But the extreme small sample size can hardly meet the statistical requirements of the traditional reliability analysis methods. In this work, the method of combining virtual sample expanding with Bootstrap is proposed to evaluate the fatigue reliability of the loader drive axle with extreme small sample. First, the sample size is expanded by virtual augmentation method to meet the requirement of Bootstrap method. Then, a modified Bootstrap method is used to evaluate the fatigue reliability of the expanded sample. Finally, the feasibility and reliability of the method are verified by comparing the results with the semi-empirical estimation method. Moreover, from the practical perspective, the promising result from this study indicates that the proposed method is more efficient than the semi-empirical method. The proposed method provides a new way for the reliability evaluation of costly and complex structures.

Download Full-text

Examination of Different Item Response Theory Models on Tests Composed of Testlets

Journal of Education and Learning ◽

10.5539/jel.v6n4p113 ◽

2017 ◽

Vol 6 (4) ◽

pp. 113

Author(s):

Esin Yilmaz Kogar ◽

Hülya Kelecioglu

Keyword(s):

Item Response Theory ◽

Sample Size ◽

Item Response ◽

Data Sets ◽

Meaningful Difference ◽

Response Theory ◽

Size Change ◽

Data Set ◽

Testlet Response Theory ◽

Item Response Theory Models

The purpose of this research is to first estimate the item and ability parameters and the standard error values related to those parameters obtained from Unidimensional Item Response Theory (UIRT), bifactor (BIF) and Testlet Response Theory models (TRT) in the tests including testlets, when the number of testlets, number of independent items, and sample size change, and then to compare the obtained results. Mathematic test in PISA 2012 was employed as the data collection tool, and 36 items were used to constitute six different data sets containing different numbers of testlets and independent items. Subsequently, from these constituted data sets, three different sample sizes of 250, 500 and 1000 persons were selected randomly. When the findings of the research were examined, it was determined that, generally the lowest mean error values were those obtained from UIRT, and TRT yielded a mean of error estimation lower than that of BIF. It was found that, under all conditions, models which take into consideration the local dependency have provided a better model-data compatibility than UIRT, generally there is no meaningful difference between BIF and TRT, and both models can be used for those data sets. It can be said that when there is a meaningful difference between those two models, generally BIF yields a better result. In addition, it has been determined that, in each sample size and data set, item and ability parameters and correlations of errors of the parameters are generally high.

Download Full-text

Synchrony of frond dynamics among patches of the clonal seaweed Mazzaella parksii (Rhodophyta) at local spatial scale

Journal of the Marine Biological Association of the United Kingdom ◽

10.1017/s0025315404010148h ◽

2004 ◽

Vol 84 (5) ◽

pp. 883-886 ◽

Cited By ~ 1

Author(s):

Ricardo Scrosati

Keyword(s):

Spatial Scale ◽

Seasonal Changes ◽

Pacific Coast ◽

Spatial Scales ◽

Sampling Effort ◽

Confidence Limits ◽

Seasonal Trends ◽

Intertidal Seaweed ◽

The Mean

This study investigated the synchrony of frond dynamics among patches of the intertidal seaweed Mazzaella parksii (=M. cornucopiae; Rhodophyta: Gigartinales) at local spatial scale. At Prasiola Point (Pacific coast of Canada), the mean synchrony of the seasonal changes in frond density among seven permanent, 100-cm2 quadrats was significant (mean Pearson's r=0·73, with 0·65–0·81 as 95% confidence limits) between 1993 and 1995. This indicates that predicting seasonal trends for non-monitored patches at local spatial scale can be done relatively well based on observations on a limited number of quadrats. The identification of the spatial scales at which seaweed populations covary synchronously will permit minimizing sampling effort while retaining the ability to make valid predictions for non-monitored sites.

Download Full-text

The importance of input data quality and quantity in climate field reconstructions – results from the assimilation of various tree-ring collections

Climate of the Past ◽

10.5194/cp-16-1061-2020 ◽

2020 ◽

Vol 16 (3) ◽

pp. 1061-1074 ◽

Cited By ~ 2

Author(s):

Jörg Franke ◽

Veronika Valler ◽

Stefan Brönnimann ◽

Raphael Neukom ◽

Fernando Jaume-Santero

Keyword(s):

Sample Size ◽

Tree Ring ◽

Input Data ◽

Small Sample ◽

Screening Methods ◽

Temperature Sensitive ◽

Data Sets ◽

Large Sample Size ◽

Spatial Coverage ◽

Reconstruction Quality

Abstract. Differences between paleoclimatic reconstructions are caused by two factors: the method and the input data. While many studies compare methods, we will focus in this study on the consequences of the input data choice in a state-of-the-art Kalman-filter paleoclimate data assimilation approach. We evaluate reconstruction quality in the 20th century based on three collections of tree-ring records: (1) 54 of the best temperature-sensitive tree-ring chronologies chosen by experts; (2) 415 temperature-sensitive tree-ring records chosen less strictly by regional working groups and statistical screening; (3) 2287 tree-ring series that are not screened for climate sensitivity. The three data sets cover the range from small sample size, small spatial coverage and strict screening for temperature sensitivity to large sample size and spatial coverage but no screening. Additionally, we explore a combination of these data sets plus screening methods to improve the reconstruction quality. A large, unscreened collection generally leads to a poor reconstruction skill. A small expert selection of extratropical Northern Hemisphere records allows for a skillful high-latitude temperature reconstruction but cannot be expected to provide information for other regions and other variables. We achieve the best reconstruction skill across all variables and regions by combining all available input data but rejecting records with insignificant climatic information (p value of regression model >0.05) and removing duplicate records. It is important to use a tree-ring proxy system model that includes both major growth limitations, temperature and moisture.

Download Full-text

Bayesian Classification of Microbial Communities Based on 16S rRNA Metagenomic Data

10.1101/340653 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arghavan Bahadorinejad ◽

Ivan Ivanov ◽

Johanna W Lampe ◽

Meredith AJ Hullar ◽

Robert S Chapkin ◽

...

Keyword(s):

16S Rrna ◽

Sample Size ◽

Microbial Communities ◽

State Of The Art ◽

Metagenomic Data ◽

Data Sets ◽

Sequencing Data ◽

Sample Data

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.

Download Full-text

Determination of the Minimum Number of Stool Bed Ortets Required to Capture a Desirable Genotype from Full-Sibling Family Crosses

Southern Journal of Applied Forestry ◽

10.1093/sjaf/27.3.160 ◽

2003 ◽

Vol 27 (3) ◽

pp. 160-163

Author(s):

Jennifer H. Myszewski ◽

Floyd E. Bridgwater ◽

Thomas D. Byram

Keyword(s):

Sample Size ◽

Field Tests ◽

Plantation Establishment ◽

Clonal Forestry ◽

Target Number ◽

Full Sibling ◽

Minimum Number ◽

Original Number ◽

Sibling Family

Abstract Two important questions for clonal forestry are: (1) how many ortets must be established to ensure that one or more of the best genotypes in a family will be available for field tests and plantation establishment; and (2) how certain can one be that at least one top genotype will be present in a sample of n ortets. In this study, we calculated the level of confidence (LOC) in having included one or more desirable, rootable genotypes in a random sample of n ortets from a full-sibling family. We also calculated the number of unique ortets required to achieve a given LOC in having included one or more desirable, rootable genotypes in a sample. In general, when the sample size is small, either because the original number of ortets was low or because of poor rootability, the LOC is lower. When rootability is low or when only a small percentage of the possible genotypes is considered desirable, the original number of ortets required to achieve a given LOC is higher. Both LOC and sample size are highly influenced by the target number of desirable genotypes to be captured in a sample of ortets. South. J. Appl. For. 27(3):160–163.

Download Full-text

A bootstrap method for estimating the minimum number of days for representative step data

Gait & Posture ◽

10.1016/j.gaitpost.2019.07.017 ◽

2019 ◽

Vol 73 ◽

pp. 35-36

Keyword(s):

Bootstrap Method ◽

Minimum Number

Download Full-text