Reporting correct p-values in VEGAS analyses

Mapping Intimacies ◽

10.1101/101014 ◽

2017 ◽

Author(s):

Julian Hecker ◽

Anna Maaser ◽

Dmitry Prokopenko ◽

Heide Loehlein Fier ◽

Christoph Lange

Keyword(s):

Linkage Disequilibrium ◽

False Positive ◽

Real Data ◽

Summary Statistics ◽

Methodological Framework ◽

Test Statistics ◽

P Values ◽

Different Types ◽

Linkage Disequilibrium Information ◽

User Friendly

AbstractVEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p-values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p-values.

Download Full-text

Reporting Correct p Values in VEGAS Analyses

Twin Research and Human Genetics ◽

10.1017/thg.2017.16 ◽

2017 ◽

Vol 20 (3) ◽

pp. 257-259 ◽

Cited By ~ 2

Author(s):

Julian Hecker ◽

Anna Maaser ◽

Dmitry Prokopenko ◽

Heide Loehlein Fier ◽

Christoph Lange

Keyword(s):

Linkage Disequilibrium ◽

False Positive ◽

Real Data ◽

Summary Statistics ◽

Methodological Framework ◽

Test Statistics ◽

P Values ◽

Different Types ◽

Linkage Disequilibrium Information ◽

User Friendly

VEGAS (versatile gene-based association study) is a popular methodological framework to perform gene-based tests based on summary statistics from single-variant analyses. The approach incorporates linkage disequilibrium information from reference panels to account for the correlation of test statistics. The gene-based test can utilize three different types of tests. In 2015, the improved framework VEGAS2, using more detailed reference panels, was published. Both versions provide user-friendly web- and offline-based tools for the analysis. However, the implementation of the popular top-percentage test is erroneous in both versions. The p values provided by VEGAS2 are deflated/anti-conservative. Based on real data examples, we demonstrate that this can increase substantially the rate of false-positive findings and can lead to inconsistencies between different test options. We also provide code that allows the user of VEGAS to compute correct p values.

Download Full-text

Accounting for Multiple Comparisons in Statistical Analysis of the Extensive Bioassay Data on Glyphosate

Toxicological Sciences ◽

10.1093/toxsci/kfaa039 ◽

2020 ◽

Vol 175 (2) ◽

pp. 156-167 ◽

Cited By ~ 1

Author(s):

Kenny Crump ◽

Edmund Crouch ◽

Daniel Zelterman ◽

Casey Crump ◽

Joseph Haseman

Keyword(s):

Dose Response ◽

False Positive ◽

Statistical Significance ◽

Permutation Tests ◽

Standard Test ◽

Statistical Testing ◽

International Agency ◽

P Value ◽

Test Statistics ◽

P Values

Abstract Glyphosate is a widely used herbicide worldwide. In 2015, the International Agency for Research on Cancer (IARC) reviewed glyphosate cancer bioassays and human studies and declared that the evidence for carcinogenicity of glyphosate is sufficient in experimental animals. We analyzed 10 glyphosate rodent bioassays, including those in which IARC found evidence of carcinogenicity, using a multiresponse permutation procedure that adjusts for the large number of tumors eligible for statistical testing and provides valid false-positive probabilities. The test statistics for these permutation tests are functions of p values from a standard test for dose-response trend applied to each specific type of tumor. We evaluated 3 permutation tests, using as test statistics the smallest p value from a standard statistical test for dose-response trend and the number of such tests for which the p value is less than or equal to .05 or .01. The false-positive probabilities obtained from 2 implementations of these 3 permutation tests are: smallest p value: .26, .17; p values ≤ .05: .08, .12; and p values ≤ .01: .06, .08. In addition, we found more evidence for negative dose-response trends than positive. Thus, we found no strong evidence that glyphosate is an animal carcinogen. The main cause for the discrepancy between IARC’s finding and ours appears to be that IARC did not account for the large number of tumor responses analyzed and the increased likelihood that several of these would show statistical significance simply by chance. This work provides a more comprehensive analysis of the animal carcinogenicity data for this important herbicide than previously available.

Download Full-text

Justifying the data analytical choice in single case research in relation to the expected data pattern

10.31234/osf.io/2b9mu ◽

2019 ◽

Author(s):

Rumen Manolov

Keyword(s):

Data Analysis ◽

Visual Analysis ◽

Multilevel Models ◽

Single Case ◽

Analytical Techniques ◽

Real Data ◽

Small Scale ◽

Data User ◽

Single Case Research ◽

User Friendly

The lack of consensus regarding the most appropriate analytical techniques for single-case experimental designs data requires justifying the choice of any specific analytical option. The current text mentions some of the arguments, provided by methodologists and statisticians, in favor of several analytical techniques. Additionally, a small-scale literature review is performed in order to explore if and how applied researchers justify the analytical choices that they make. The review suggests that certain practices are not sufficiently explained. In order to improve the reporting regarding the data analytical decisions, it is proposed to choose and justify the data analytical approach prior to gathering the data. As a possible justification for data analysis plan, we propose using as a basis the expected the data pattern (specifically, the expectation about an improving baseline trend and about the immediate or progressive nature of the intervention effect). Although there are multiple alternatives for single-case data analysis, the current text focuses on visual analysis and multilevel models and illustrates an application of these analytical options with real data. User-friendly software is also developed.

Download Full-text

Goodness-of-Fit Tests for Bivariate Time Series of Counts

Econometrics ◽

10.3390/econometrics9010010 ◽

2021 ◽

Vol 9 (1) ◽

pp. 10

Author(s):

Šárka Hudecová ◽

Marie Hušková ◽

Simos G. Meintanis

Keyword(s):

Goodness Of Fit ◽

Probability Generating Function ◽

Parametric Bootstrap ◽

Real Data ◽

Data Sets ◽

Test Statistics ◽

Finite Sample ◽

Generalized Poisson ◽

Goodness Of Fit Tests ◽

Monte Carlo Experiments

This article considers goodness-of-fit tests for bivariate INAR and bivariate Poisson autoregression models. The test statistics are based on an L2-type distance between two estimators of the probability generating function of the observations: one being entirely nonparametric and the second one being semiparametric computed under the corresponding null hypothesis. The asymptotic distribution of the proposed tests statistics both under the null hypotheses as well as under alternatives is derived and consistency is proved. The case of testing bivariate generalized Poisson autoregression and extension of the methods to dimension higher than two are also discussed. The finite-sample performance of a parametric bootstrap version of the tests is illustrated via a series of Monte Carlo experiments. The article concludes with applications on real data sets and discussion.

Download Full-text

CPVA: a web-based metabolomic tool for chromatographic peak visualization and annotation

Bioinformatics ◽

10.1093/bioinformatics/btaa200 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3913-3915

Author(s):

Hemi Luan ◽

Xingen Jiang ◽

Fenfen Ji ◽

Zhangzhang Lan ◽

Zongwei Cai ◽

...

Keyword(s):

False Positive ◽

Supplementary Information ◽

Liquid Chromatography Mass Spectrometry ◽

Targeted Metabolomics ◽

Metabolomics Data ◽

Web Based ◽

Tremendous Amount ◽

Chromatographic Peaks ◽

User Friendly

Abstract Motivation Liquid chromatography–mass spectrometry-based non-targeted metabolomics is routinely performed to qualitatively and quantitatively analyze a tremendous amount of metabolite signals in complex biological samples. However, false-positive peaks in the datasets are commonly detected as metabolite signals by using many popular software, resulting in non-reliable measurement. Results To reduce false-positive calling, we developed an interactive web tool, termed CPVA, for visualization and accurate annotation of the detected peaks in non-targeted metabolomics data. We used a chromatogram-centric strategy to unfold the characteristics of chromatographic peaks through visualization of peak morphology metrics, with additional functions to annotate adducts, isotopes and contaminants. CPVA is a free, user-friendly tool to help users to identify peak background noises and contaminants, resulting in decrease of false-positive or redundant peak calling, thereby improving the data quality of non-targeted metabolomics studies. Availability and implementation The CPVA is freely available at http://cpva.eastus.cloudapp.azure.com. Source code and installation instructions are available on GitHub: https://github.com/13479776/cpva. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

mixIndependR: a R package for statistical independence testing of loci in database of multi-locus genotypes

BMC Bioinformatics ◽

10.1186/s12859-020-03945-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Bing Song ◽

August E. Woerner ◽

John Planz

Keyword(s):

Population Genetics ◽

Linkage Disequilibrium ◽

Genetic Markers ◽

Software Package ◽

Tandem Repeats ◽

Population Data ◽

Real Data ◽

R Package ◽

Nucleotide Polymorphisms ◽

Mutual Independence

Abstract Background Multi-locus genotype data are widely used in population genetics and disease studies. In evaluating the utility of multi-locus data, the independence of markers is commonly considered in many genomic assessments. Generally, pairwise non-random associations are tested by linkage disequilibrium; however, the dependence of one panel might be triplet, quartet, or other. Therefore, a compatible and user-friendly software is necessary for testing and assessing the global linkage disequilibrium among mixed genetic data. Results This study describes a software package for testing the mutual independence of mixed genetic datasets. Mutual independence is defined as no non-random associations among all subsets of the tested panel. The new R package “mixIndependR” calculates basic genetic parameters like allele frequency, genotype frequency, heterozygosity, Hardy–Weinberg equilibrium, and linkage disequilibrium (LD) by mutual independence from population data, regardless of the type of markers, such as simple nucleotide polymorphisms, short tandem repeats, insertions and deletions, and any other genetic markers. A novel method of assessing the dependence of mixed genetic panels is developed in this study and functionally analyzed in the software package. By comparing the observed distribution of two common summary statistics (the number of heterozygous loci [K] and the number of share alleles [X]) with their expected distributions under the assumption of mutual independence, the overall independence is tested. Conclusion The package “mixIndependR” is compatible to all categories of genetic markers and detects the overall non-random associations. Compared to pairwise disequilibrium, the approach described herein tends to have higher power, especially when number of markers is large. With this package, more multi-functional or stronger genetic panels can be developed, like mixed panels with different kinds of markers. In population genetics, the package “mixIndependR” makes it possible to discover more about admixture of populations, natural selection, genetic drift, and population demographics, as a more powerful method of detecting LD. Moreover, this new approach can optimize variants selection in disease studies and contribute to panel combination for treatments in multimorbidity. Application of this approach in real data is expected in the future, and this might bring a leap in the field of genetic technology. Availability The R package mixIndependR, is available on the Comprehensive R Archive Network (CRAN) at: https://cran.r-project.org/web/packages/mixIndependR/index.html.

Download Full-text

Comparison of two different types of fractional-order COVID-19 distributed time-delay models with real data application

International Journal of Modern Physics B ◽

10.1142/s0217979221502192 ◽

2021 ◽

pp. 2150219

Author(s):

Liangli Yang ◽

Yongmei Su ◽

Xinjian Zhuo

Keyword(s):

Time Delay ◽

Fractional Order ◽

Distributed Delay ◽

Real Data ◽

Least Square ◽

Order Model ◽

The Real ◽

Fractional Order Model ◽

Different Types ◽

Distributed Time Delay

The outbreak of COVID-19 has a great impact on the world. Considering that there are different infection delays among different populations, which can be expressed as distributed delay, and the distributed time-delay is rarely used in fractional-order model to simulate the real data, here we establish two different types of fractional order (Caputo and Caputo–Fabrizio) COVID-19 models with distributed time-delay. Parameters are estimated by the least-square method according to the report data of China and other 12 countries. The results of Caputo and Caputo–Fabrizio model with distributed time-delay and without delay, the integer-order model with distributed delay are compared. These show that the fractional-order model can be better in fitting the real data. Moreover, Caputo order is better in short-term time fitting, Caputo–Fabrizio order is better in long-term fitting and prediction. Finally, the influence of several parameters is simulated in Caputo order model, which further verifies the importance of taking strict quarantine measures and paying close attention to the incubation period population.

Download Full-text

A test for fuzzy exponentiality based on Kullback-Leibler information

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202555 ◽

2021 ◽

pp. 1-8

Author(s):

Lingtao Kong

Keyword(s):

Biological Sciences ◽

Monte Carlo ◽

Goodness Of Fit ◽

Experimental Studies ◽

Real Data ◽

Test Statistics ◽

Test Statistic ◽

Goodness Of Fit Test ◽

Higher Power ◽

Leibler Information

The exponential distribution has been widely used in engineering, social and biological sciences. In this paper, we propose a new goodness-of-fit test for fuzzy exponentiality using α-pessimistic value. The test statistics is established based on Kullback-Leibler information. By using Monte Carlo method, we obtain the empirical critical points of the test statistic at four different significant levels. To evaluate the performance of the proposed test, we compare it with four commonly used tests through some simulations. Experimental studies show that the proposed test has higher power than other tests in most cases. In particular, for the uniform and linear failure rate alternatives, our method has the best performance. A real data example is investigated to show the application of our test.

Download Full-text

Cell Bioprinting: The 3D-Bioplotter™ Case

Materials ◽

10.3390/ma12234005 ◽

2019 ◽

Vol 12 (23) ◽

pp. 4005 ◽

Cited By ~ 1

Author(s):

Angelats Lobo ◽

Ginestra

Keyword(s):

3D Structure ◽

Three Dimensional ◽

Petri Dish ◽

Two Dimensions ◽

Point Of View ◽

Lack Of Information ◽

Synthetic Materials ◽

Different Types ◽

Available Information ◽

User Friendly

The classic cell culture involves the use of support in two dimensions, such as a well plate or a Petri dish, that allows the culture of different types of cells. However, this technique does not mimic the natural microenvironment where the cells are exposed to. To solve that, three-dimensional bioprinting techniques were implemented, which involves the use of biopolymers and/or synthetic materials and cells. Because of a lack of information between data sources, the objective of this review paper is, to sum up, all the available information on the topic of bioprinting and to help researchers with the problematics with 3D bioprinters, such as the 3D-Bioplotter™. The 3D-Bioplotter™ has been used in the pre-clinical field since 2000 and could allow the printing of more than one material at the same time, and therefore to increase the complexity of the 3D structure manufactured. It is also very precise with maximum flexibility and a user-friendly and stable software that allows the optimization of the bioprinting process on the technological point of view. Different applications have resulted from the research on this field, mainly focused on regenerative medicine, but the lack of information and/or the possible misunderstandings between papers makes the reproducibility of the tests difficult. Nowadays, the 3D Bioprinting is evolving into another technology called 4D Bioprinting, which promises to be the next step in the bioprinting field and might promote great applications in the future.

Download Full-text

Improving the admissibility screen: evaluating test statistics on the basis of p-values rather than power

Communication in Statistics- Theory and Methods ◽

10.1080/03610929608831712 ◽

1996 ◽

Vol 25 (3) ◽

pp. 537-553 ◽

Cited By ~ 4

Author(s):

Peter Thompson

Keyword(s):

Test Statistics ◽

P Values

Download Full-text