Not just methods: User expertise explains the variability of outcomes of genome-wide studies

Mapping Intimacies ◽

10.1101/055046 ◽

2016 ◽

Cited By ~ 1

Author(s):

Katie E. Lotterhos ◽

Olivier François ◽

Michael G.B. Blum

Keyword(s):

Summer School ◽

Statistical Power ◽

A Priori ◽

Simulated Data ◽

Genome Scan ◽

Genome Wide ◽

The Difference ◽

Difficult Challenge ◽

Genomic Regions ◽

Candidate Loci

AbstractGenome scan approaches promise to map genomic regions involved in adaptation of individuals to their environment. Outcomes of genome scans have been shown to depend on several factors including the underlying demography, the adaptive scenario, and the software or method used. We took advantage of a pedagogical experiment carried out during a summer school to explore the effect of an unexplored source of variability, which is the degree of user expertise.Participants were asked to analyze three simulated data challenges with methods presented during the summer school. In addition to submitting lists, participants evaluated a priori their level of expertise. We measured the quality of each genome scan analysis by computing a score that depends on false discovery rate and statistical power. In an easy and a difficult challenge, less advanced participants obtained similar scores compared to advanced ones, demonstrating that participants with little background in genome scan methods were able to learn how to use complex software after short introductory tutorials. However, in a challenge ofintermediate difficulty, advanced participants obtained better scores. To explain the difference, we introduce a probabilistic model that shows that a larger variation in scores is expected for SNPs of intermediate difficulty of detection. We conclude that practitioners should develop their statistical and computational expertise to follow the development of complex methods. To encourage training, we release the website of the summer school where users can submit lists of candidate loci, which will be scored and compared to the scores obtained by previous users.

Download Full-text

Genome-Wide Analyses Identifies Known and New Markers Responsible of Chicken Plumage Color

Animals ◽

10.3390/ani10030493 ◽

2020 ◽

Vol 10 (3) ◽

pp. 493

Author(s):

Salvatore Mastrangelo ◽

Filippo Cendron ◽

Gianluca Sottile ◽

Giovanni Niero ◽

Baldassare Portolano ◽

...

Keyword(s):

Genome Wide Association Study ◽

Snp Array ◽

Fixation Index ◽

Phenotypic Traits ◽

Plumage Color ◽

Phenotypic Marker ◽

Genome Wide ◽

A Genome ◽

The Difference ◽

Genomic Regions

Through the development of the high-throughput genotyping arrays, molecular markers and genes related to phenotypic traits have been identified in livestock species. In poultry, plumage color is an important qualitative trait that can be used as phenotypic marker for breed identification. In order to assess sources of genetic variation related to the Polverara chicken breed plumage colour (black vs. white), we carried out a genome-wide association study (GWAS) and a genome-wide fixation index (FST) scan to uncover the genomic regions involved. A total of 37 animals (17 white and 20 black) were genotyped with the Affymetrix 600 K Chicken single nucleotide polymorphism (SNP) Array. The combination of results from GWAS and FST revealed a total of 40 significant markers distributed on GGA 01, 03, 08, 12 and 21, and located within or near known genes. In addition to the well-known TYR, other candidate genes have been identified in this study, such as GRM5, RAB38 and NOTCH2. All these genes could explain the difference between the two Polverara breeds. Therefore, this study provides the basis for further investigation of the genetic mechanisms involved in plumage color in chicken.

Download Full-text

On the Difference between a-priori and observed statistical power — A comment on “statistical power and sample size calculations: A primer for pediatric surgeons”

Journal of Pediatric Surgery ◽

10.1016/j.jpedsurg.2019.08.054 ◽

2020 ◽

Vol 55 (1) ◽

pp. 203-205

Author(s):

Arne Schröder ◽

Christina Oetzmann von Sochaczewski

Keyword(s):

Sample Size ◽

Statistical Power ◽

A Priori ◽

Sample Size Calculations ◽

The Difference

Download Full-text

Quantifying annual patterns in the frequency of mammalian births: do goodness-of-fit tests provide adequate inferences?

Australian Journal of Zoology ◽

10.1071/zo12056 ◽

2012 ◽

Vol 60 (6) ◽

pp. 381 ◽

Cited By ~ 3

Author(s):

Evan Watkins ◽

Julian Di Stefano

Keyword(s):

Sample Size ◽

Effect Size ◽

Statistical Power ◽

Goodness Of Fit ◽

A Priori ◽

Simulated Data ◽

Maximum Effect ◽

Chi Square ◽

Goodness Of Fit Tests ◽

Power Studies

Hypotheses relating to the annual frequency distribution of mammalian births are commonly tested using a goodness-of-fit procedure. Several interacting factors influence the statistical power of these tests, but no power studies have been conducted using scenarios derived from biological hypotheses. Corresponding to theories relating reproductive output to seasonal resource fluctuation, we simulated data reflecting a winter reduction in birth frequency to test the effect of four factors (sample size, maximum effect size, the temporal pattern of response and the number of categories used for analysis) on the power of three goodness-of-fit procedures – the G and Chi-square tests and Watson’s U2 test. Analyses resulting in high power all had a large maximum effect size (60%) and were associated with a sample size of 200 on most occasions. The G-test was the most powerful when data were analysed using two temporal categories (winter and other) while Watson’s U2 test achieved the highest power when 12 monthly categories were used. Overall, the power of most modelled scenarios was low. Consequently, we recommend using power analysis as a research planning tool, and have provided a spreadsheet enabling a priori power calculations for the three tests considered.

Download Full-text

SNP and Haplotype Regional Heritability Mapping (SNHap-RHM): Joint Mapping of Common and Rare Variation Affecting Complex Traits

Frontiers in Genetics ◽

10.3389/fgene.2021.791712 ◽

2022 ◽

Vol 12 ◽

Author(s):

Richard F. Oppong ◽

Thibaud Boutin ◽

Archie Campbell ◽

Andrew M. McIntosh ◽

David Porteous ◽

...

Keyword(s):

Major Depressive Disorder ◽

Depressive Disorder ◽

Complex Traits ◽

Simulated Data ◽

Health Study ◽

Major Depressive ◽

Genome Wide ◽

A Genome ◽

Joint Mapping ◽

Genomic Regions

We describe a genome-wide analytical approach, SNP and Haplotype Regional Heritability Mapping (SNHap-RHM), that provides regional estimates of the heritability across locally defined regions in the genome. This approach utilises relationship matrices that are based on sharing of SNP and haplotype alleles at local haplotype blocks delimited by recombination boundaries in the genome. We implemented the approach on simulated data and show that the haplotype-based regional GRMs capture variation that is complementary to that captured by SNP-based regional GRMs, and thus justifying the fitting of the two GRMs jointly in a single analysis (SNHap-RHM). SNHap-RHM captures regions in the genome contributing to the phenotypic variation that existing genome-wide analysis methods may fail to capture. We further demonstrate that there are real benefits to be gained from this approach by applying it to real data from about 20,000 individuals from the Generation Scotland: Scottish Family Health Study. We analysed height and major depressive disorder (MDD). We identified seven genomic regions that are genome-wide significant for height, and three regions significant at a suggestive threshold (p-value < 1 × 10−5) for MDD. These significant regions have genes mapped to within 400 kb of them. The genes mapped for height have been reported to be associated with height in humans. Similarly, those mapped for MDD have been reported to be associated with major depressive disorder and other psychiatry phenotypes. The results show that SNHap-RHM presents an exciting new opportunity to analyse complex traits by allowing the joint mapping of novel genomic regions tagged by either SNPs or haplotypes, potentially leading to the recovery of some of the “missing” heritability.

Download Full-text

Barcoded reciprocal hemizygosity analysis via sequencing illuminates the complex genetic basis of yeast thermotolerance

G3 Genes|Genome|Genetics ◽

10.1093/g3journal/jkab412 ◽

2021 ◽

Author(s):

Melanie B Abrams ◽

Julie N Chuong ◽

Faisal AlZaben ◽

Claire A Dubin ◽

Jeffrey M Skerker ◽

...

Keyword(s):

Genetic Basis ◽

Genetic Dissection ◽

Proof Of Concept ◽

Mutagenesis Screen ◽

Genetics Research ◽

Multiple Loci ◽

Genome Wide ◽

The Difference ◽

Candidate Loci ◽

Reciprocal Hemizygosity Analysis

Abstract Decades of successes in statistical genetics have revealed the molecular underpinnings of traits as they vary across individuals of a given species. But standard methods in the field can’t be applied to divergences between reproductively isolated taxa. Genome-wide reciprocal hemizygosity mapping (RH-seq), a mutagenesis screen in an inter-species hybrid background, holds promise as a method to accelerate the progress of interspecies genetics research. Here we describe an improvement to RH-seq in which mutants harbor barcodes for cheap and straightforward sequencing after selection in a condition of interest. As a proof of concept for the new tool, we carried out genetic dissection of the difference in thermotolerance between two reproductively isolated budding yeast species. Experimental screening identified dozens of candidate loci at which variation between the species contributed to the thermotolerance trait. Hits were enriched for mitosis genes and other housekeeping factors, and among them were multiple loci with robust sequence signatures of positive selection. Together, these results shed new light on the mechanisms by which evolution solved the problems of cell survival and division at high temperature in the yeast clade, and they illustrate the power of the barcoded RH-seq approach.

Download Full-text

Transcriptome-wide association and prediction for carotenoids and tocochromanols in fresh sweet corn kernels

10.1101/2021.09.24.461734 ◽

2021 ◽

Author(s):

Jenna Hershberger ◽

Ryokei Tanaka ◽

Joshua C. Wood ◽

Nicholas Kaczmar ◽

Di Wu ◽

...

Keyword(s):

Statistical Power ◽

Prediction Models ◽

Sweet Corn ◽

Association Studies ◽

A Priori ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Causal Genes ◽

Corn Kernels

Sweet corn is consistently one of the most highly consumed vegetables in the U.S., providing a valuable opportunity to increase nutrient intake through biofortification. Significant variation for carotenoid (provitamin A, lutein, zeaxanthin) and tocochromanol (vitamin E, antioxidants) levels is present in temperate sweet corn germplasm, yet previous genome-wide association studies (GWAS) of these traits have been limited by low statistical power and mapping resolution. Here, we employed a high-quality transcriptomic dataset collected from fresh sweet corn kernels to conduct transcriptome-wide association studies (TWAS) and transcriptome prediction studies for 39 carotenoid and tocochromanol traits. In agreement with previous GWAS findings, TWAS detected significant associations for four causal genes, β-carotene hydroxylase (crtRB1), lycopene epsilon cyclase (lcyE), γ-tocopherol methyltransferase (vte4), and homogentisate geranylgeranyltransferase (hggt1) on a transcriptome-wide level. Pathway-level analysis revealed additional associations for deoxy-xylulose synthase2 (dxs2), diphosphocytidyl methyl erythritol synthase2 (dmes2), cytidine methyl kinase1 (cmk1), and geranylgeranyl hydrogenase1 (ggh1), of which, dmes2, cmk1, and ggh1 have not previously been identified through maize association studies. Evaluation of prediction models incorporating genome-wide markers and transcriptome-wide abundances revealed a trait-dependent benefit to the inclusion of both genomic and transcriptomic data over solely genomic data, but both transcriptome- and genome-wide datasets outperformed a priori candidate gene-targeted prediction models for most traits. Altogether, this study represents an important step towards understanding the role of regulatory variation in the accumulation of vitamins in fresh sweet corn kernels.

Download Full-text

Reconstruction of interfaces of periodic multilayers from X-ray reflectivity using a free-form approach

Journal of Applied Crystallography ◽

10.1107/s160057671601044x ◽

2016 ◽

Vol 49 (4) ◽

pp. 1300-1307 ◽

Cited By ~ 12

Author(s):

Andrey Zameshin ◽

Igor A. Makhotkin ◽

Sergey N. Yakunin ◽

Robbert W. E. van de Kruijs ◽

Andrey E. Yakshin ◽

...

Keyword(s):

Optical Constant ◽

A Priori ◽

Simulated Data ◽

Grazing Incidence ◽

Free Form ◽

Interface Composition ◽

X Ray ◽

The Difference ◽

Interface Parameters ◽

Form Approach

Grazing-incidence X-ray reflectivity (GIXRR) is a widely used analysis method for thin films and multilayer structures. However, conventional so-called model-based approaches of structural reconstruction from GIXRR data lack flexibility when dealing with very thin structures (down to the nanometre scale), becausea prioriassumptions have to be made about the interface composition and structure. This makes it very difficult to extract reliable information about such structures. In this work, a custom free-form approach is presented, which solves this task without the need fora prioriassumptions on layer or interface parameters. As a proof of principle, an optical constant profile reconstruction and GIXRR curve matching for simulated data are demonstrated. The developed approach is used to analyse the structures of multilayer LaN/B Bragg reflectors designed for the extreme UV range. The performed analysis allowed the difference in optical constant profiles of these structures produced with different processes to be revealed. The uncertainties of structural reconstruction are also discussed.

Download Full-text

A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination

10.1101/140798 ◽

2017 ◽

Cited By ~ 4

Author(s):

Caitlin Collins ◽

Xavier Didelot

Keyword(s):

Population Structure ◽

Statistical Power ◽

Sequence Data ◽

Association Studies ◽

Strong Support ◽

Simulated Data ◽

Invasive Disease ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractGenome-Wide Association Studies (GWAS) in microbial organisms have the potential to vastly improve the way we understand, manage, and treat infectious diseases. Yet, GWAS methods established thus far remain insufficiently able to capitalise on the growing wealth of bacterial and viral genetic sequence data. Facing clonal population structure and homologous recombination, existing GWAS methods struggle to achieve both the precision necessary to reject spurious findings and the power required to detect associations in microbes. In this paper, we introduce a novel phylogenetic approach that has been tailor-made for microbial GWAS, which is applicable to organisms ranging from purely clonal to frequently recombining, and to both binary and continuous phenotypes. Our approach is robust to the confounding effects of both population structure and recombination, while maintaining high statistical power to detect associations. Thorough testing via application to simulated data provides strong support for the power and specificity of our approach and demonstrates the advantages offered over alternative cluster-based and dimension-reduction methods. Two applications toNeisseria meningitidisillustrate the versatility and potential of our method, confirming previously-identified penicillin resistance loci and resulting in the identification of both well-characterised and novel drivers of invasive disease. Our method is implemented as an open-source R package called treeWAS which is freely available athttps://github.com/caitiecollins/treeWAS.

Download Full-text

Multiple Testing and Statistical Power With Modified Bonferroni Procedures

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986022004389 ◽

1997 ◽

Vol 22 (4) ◽

pp. 389-406 ◽

Cited By ~ 34

Author(s):

Stephen Olejnik ◽

Jianmin Li ◽

Suchada Supattathum ◽

Carl J. Huberty

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Type I Error ◽

Average Power ◽

Simulated Data ◽

Type I ◽

Correlation Matrices ◽

Bonferroni Procedure ◽

Power Difference ◽

The Difference

The difference in statistical power between the original Bonferroni and five modified Bonferroni procedures that control the overall Type I error rate is examined in the context of a correlation matrix where multiple null hypotheses, H0 : ρ ij = 0 for all i ≠ j, are tested. Using 50 real correlation matrices reported in educational and psychological journals, a difference in the number of hypotheses rejected of less than 4% was observed among the procedures. When simulated data were used, very small differences were found among the six procedures in detecting at least one true relationship, but in detecting all true relationships the power of the modified Bonferroni procedures exceeded that of the original Bonferroni procedure by at least .18 and by as much as .55 when all null hypotheses were false. The power difference decreased as the number of true relationships decreased. Power differences obtained for the average power were of a much smaller magnitude but still favored the modified Bonferroni procedures. For the five modified Bonferroni procedures, power differences less than .05 were typically observed; the Holm procedure had the lowest power, and the Rom procedure had the highest.

Download Full-text

ReMo-SNPs: a new software tool for identification of polymorphisms in regions and motifs genome-wide

Genetics Research ◽

10.1017/s0016672315000051 ◽

2015 ◽

Vol 97 ◽

Author(s):

LISETTE GRAAE ◽

SILVIA PADDOCK ◽

ANDREA CARMINE BELIN

Keyword(s):

Association Study ◽

Association Studies ◽

Genetic Diseases ◽

A Priori ◽

Software Tool ◽

Data Sets ◽

Genome Wide Association Studies ◽

Association Analyses ◽

Genome Wide ◽

Genomic Regions

SummaryStudies of complex genetic diseases have revealed many risk factors of small effect, but the combined amount of heritability explained is still low. Genome-wide association studies are often underpowered to identify true effects because of the very large number of parallel tests. There is, therefore, a great need to generate data sets that are enriched for those markers that have an increased a priori chance of being functional, such as markers in genomic regions involved in gene regulation. ReMo-SNPs is a computational program developed to aid researchers in the process of selecting functional SNPs for association analyses in user-specified regions and/or motifs genome-wide. The useful feature of automatic selection of genotyped markers in the user-provided material makes the output data ready to be used in a following association study. In this article we describe the program and its functions. We also validate the program by including an example study on three different transcription factors and results from an association study on two psychiatric phenotypes. The flexibility of the ReMo-SNPs program enables the user to study any region or sequence of interest, without limitation to transcription factor binding regions and motifs. The program is freely available at: http://www.neuro.ki.se/ReMo-SNPs/

Download Full-text