unreplicated experiments Latest Research Papers

Getting the most out of RNA-seq data analysis

PeerJ ◽

10.7717/peerj.1360 ◽

2015 ◽

Vol 3 ◽

pp. e1360 ◽

Cited By ~ 14

Author(s):

Tsung Fei Khang ◽

Ching Yee Lau

Keyword(s):

Differentially Expressed Genes ◽

Effect Size ◽

Biological Effect ◽

Differentially Expressed Gene ◽

Differentially Expressed ◽

Specific Method ◽

Data Sets ◽

Rna Seq ◽

Unreplicated Experiments ◽

Level Analysis

Background.A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally, or use them for downstream systems biology analysis. Producing a coherent differential gene expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect size, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to biologists.Results.Using two large public RNA-seq data sets—one representing strong, and another mild, biological effect size—we simulated different replicate size scenarios, and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. We found that, when biological effect size was mild, RNA-seq experiments should focus on experimental validation of differentially expressed gene candidates. Importantly, at least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value (PPV), such as NOISeq or GFOLD. In contrast, when biological effect size was strong, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean PPV, an increase of more than 30-fold compared to the cases of mild biological effect size. Among methods with good PPV performance, having triplicates or more substantially improved mean PPV to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. At a replicate size of six, we found DESeq2 and edgeR to be reasonable methods for calling differentially expressed genes at systems level analysis, as their PPV and sensitivity trade-off were superior to the other methods’.Conclusion.When biological effect size is weak, systems level investigation is not possible using RNAseq data, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of gene candidates with good validation potential, when triplicates or more are available. When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for qPCR validation. When triplicates or more are available, GFOLD is a sharp tool for identifying high confidence differentially expressed genes for targeted qPCR validation; for downstream systems level analysis, combined results from DESeq2 and edgeR are useful.

Download Full-text

Getting the most out of RNA-seq data analysis

10.7287/peerj.preprints.1198v1 ◽

2015 ◽

Author(s):

Tsung Fei Khang ◽

Ching Yee Lau

Keyword(s):

Positive Predictive Value ◽

Differentially Expressed Genes ◽

Predictive Value ◽

Biological Effect ◽

Biological Response ◽

Differentially Expressed ◽

Data Sets ◽

Rna Seq ◽

Unreplicated Experiments ◽

Level Analysis

Background: A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally or use them for downstream systems biology analysis. Producing a coherent differential expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to the biologist. Results: Using two large public RNA-seq data sets - one representing strong, and another mild, biological response, we simulated different replicate size scenarios and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. Our results suggest that if the biological response of interest in the different phenotype classes is expected to be mild, then RNA-seq experiments should focus on validation of differentially expressed gene candidates. At least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value such as NOISeq or GFOLD. In contrast, for strong biological response, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean positive predictive value, an increase of more than 30-fold compared to the case of mild biological response. Among methods with good positive predictive value performance, having triplicates or more substantially improved mean positive predictive value to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. We found DESeq2 to be the most reasonable method to call differentially expressed genes for systems level analysis as it showed the best PPV and sensitivity trade-off (mean PPV and mean sensitivity ∼ 65% at replicate size of six). Conclusion: When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for validation work. Having triplicates or more enables DESeq2 to detect sufficiently large numbers of reliable gene candidates for downstream systems level analysis. When biological effect size is weak, systems level investigation is not possible, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of candidates with good validation potential when triplicates or more are available.

Download Full-text

Getting the most out of RNA-seq data analysis

10.7287/peerj.preprints.1198 ◽

2015 ◽

Author(s):

Tsung Fei Khang ◽

Ching Yee Lau

Keyword(s):

Positive Predictive Value ◽

Differentially Expressed Genes ◽

Predictive Value ◽

Biological Effect ◽

Biological Response ◽

Differentially Expressed ◽

Data Sets ◽

Rna Seq ◽

Unreplicated Experiments ◽

Level Analysis

Background: A common research goal in transcriptome projects is to find genes that are differentially expressed in different phenotype classes. Biologists might wish to validate such gene candidates experimentally or use them for downstream systems biology analysis. Producing a coherent differential expression analysis from RNA-seq count data requires an understanding of how numerous sources of variation such as the replicate size, the hypothesized biological effect, and the specific method for making differential expression calls interact. We believe an explicit demonstration of such interactions in real RNA-seq data sets is of practical interest to the biologist. Results: Using two large public RNA-seq data sets - one representing strong, and another mild, biological response, we simulated different replicate size scenarios and tested the performance of several commonly-used methods for calling differentially expressed genes in each of them. Our results suggest that if the biological response of interest in the different phenotype classes is expected to be mild, then RNA-seq experiments should focus on validation of differentially expressed gene candidates. At least triplicates must be used, and the differentially expressed genes should be called using methods with high positive predictive value such as NOISeq or GFOLD. In contrast, for strong biological response, differentially expressed genes mined from unreplicated experiments using NOISeq, ASC and GFOLD had between 30 to 50% mean positive predictive value, an increase of more than 30-fold compared to the case of mild biological response. Among methods with good positive predictive value performance, having triplicates or more substantially improved mean positive predictive value to over 90% for GFOLD, 60% for DESeq2, 50% for NOISeq, and 30% for edgeR. We found DESeq2 to be the most reasonable method to call differentially expressed genes for systems level analysis as it showed the best PPV and sensitivity trade-off (mean PPV and mean sensitivity ∼ 65% at replicate size of six). Conclusion: When biological effect size is strong, NOISeq and GFOLD are effective tools for detecting differentially expressed genes in unreplicated RNA-seq experiments for validation work. Having triplicates or more enables DESeq2 to detect sufficiently large numbers of reliable gene candidates for downstream systems level analysis. When biological effect size is weak, systems level investigation is not possible, and no meaningful result can be obtained in unreplicated experiments. Nonetheless, NOISeq or GFOLD may yield limited numbers of candidates with good validation potential when triplicates or more are available.

Download Full-text

Lenth's Method for the Analysis of Unreplicated Experiments

Wiley StatsRef: Statistics Reference Online ◽

10.1002/9781118445112.stat04086 ◽

2014 ◽

Author(s):

Russell V. Lenth

Keyword(s):

Unreplicated Experiments

Download Full-text

A Bayesian Analysis of Very Small Unreplicated Experiments

Quality and Reliability Engineering International ◽

10.1002/qre.1578 ◽

2013 ◽

Vol 30 (3) ◽

pp. 413-426 ◽

Cited By ~ 4

Author(s):

Víctor Aguirre-Torres ◽

Román de la Vara

Keyword(s):

Bayesian Analysis ◽

Unreplicated Experiments

Download Full-text

Check plots in field breeding experiments

Biometrical Letters ◽

10.2478/bile-2013-0024 ◽

2013 ◽

Vol 50 (2) ◽

pp. 137-149 ◽

Cited By ~ 1

Author(s):

Stanisław Mejza ◽

Iwona Mejza

Keyword(s):

Soil Fertility ◽

Response Surface ◽

Soil Structure ◽

Experimental Situation ◽

Spring Barley ◽

Main Tool ◽

Breeding Program ◽

Genotype Effect ◽

Unreplicated Experiments ◽

Theoretical Considerations

Summary This paper deals with the problems of selection in the early stages of a breeding program. During the improvement process, it is not possible to use an experimental design that satisfies the requirement of replicating all the treatments, because of the large number of genotypes involved, the small amount of seed and the low availability of resources. Hence unreplicated designs are used. To control the real or potential heterogeneity of experimental units, control (check) plots are arranged in the trial. There are many methods of using the information resulting from check plots. All of the usually applied adjusting methods for unreplicated experiments are appropriate for some specific structure of soil fertility. Their disadvantage is the fact that, before and also after the experiment, we usually do not know what a kind of soil structure is present in the experiment. Hence we cannot say which of the existing methods is appropriate for a given experimental situation. The method of inference presented below avoids this disadvantage. It is always appropriate, because of the fact that a trend of soil variability is identified and estimated. In the paper the main tool used to explore this information will be based on a response surface methodology. To begin with we will try to identify a response surface characterizing the experimental environments. We assume that observed yield (or another trait) results directly from two components, one of them due to soil fertility and the other due to the genotype effect. This means that difference between observed yield and forecast can be treated as the estimate of a genotype effect. The obtained response surface will then be used to adjust the observations for genotypes. Finally, the data so adjusted are used for inferences concerning the next stage of the breeding program. The theoretical considerations are illustrated with an example involving yields of spring barley.

Download Full-text

A method for detecting hidden additivity in two-factor unreplicated experiments

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2013.05.002 ◽

2013 ◽

Vol 67 ◽

pp. 95-104 ◽

Cited By ~ 3

Author(s):

Christopher T. Franck ◽

Dahlia M. Nielsen ◽

Jason A. Osborne

Keyword(s):

Unreplicated Experiments

Download Full-text

Three Five-Factor, Two-Level Unreplicated Experiments

Wiley Series in Probability and Statistics - Applications of Statistics to Industrial Experimentation ◽

10.1002/9780470316467.ch7 ◽

2011 ◽

pp. 127-150

Keyword(s):

Unreplicated Experiments

Download Full-text

Lenth's Method for the Analysis of Unreplicated Experiments

Encyclopedia of Statistics in Quality and Reliability ◽

10.1002/9780470061572.eqr014 ◽

2008 ◽

Cited By ~ 2

Author(s):

Russell V. Lenth

Keyword(s):

Unreplicated Experiments

Download Full-text

New and Traditional Methods for the Analysis of Unreplicated Experiments

Crop Science ◽

10.2135/cropsci2006.04.0273 ◽

2006 ◽

Vol 46 (6) ◽

pp. 2476-2481 ◽

Cited By ~ 17

Author(s):

Roger W. Payne

Keyword(s):

Traditional Methods ◽

Unreplicated Experiments

Download Full-text

unreplicated experiments
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Getting the most out of RNA-seq data analysis

Getting the most out of RNA-seq data analysis

Getting the most out of RNA-seq data analysis

Lenth's Method for the Analysis of Unreplicated Experiments

A Bayesian Analysis of Very Small Unreplicated Experiments

Check plots in field breeding experiments

A method for detecting hidden additivity in two-factor unreplicated experiments

Three Five-Factor, Two-Level Unreplicated Experiments

Lenth's Method for the Analysis of Unreplicated Experiments

New and Traditional Methods for the Analysis of Unreplicated Experiments

Export Citation Format

unreplicated experimentsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Getting the most out of RNA-seq data analysis

Getting the most out of RNA-seq data analysis

Getting the most out of RNA-seq data analysis

Lenth's Method for the Analysis of Unreplicated Experiments

A Bayesian Analysis of Very Small Unreplicated Experiments

Check plots in field breeding experiments

A method for detecting hidden additivity in two-factor unreplicated experiments

Three Five-Factor, Two-Level Unreplicated Experiments

Lenth's Method for the Analysis of Unreplicated Experiments

New and Traditional Methods for the Analysis of Unreplicated Experiments

unreplicated experiments
Recently Published Documents