false discovery rates
Recently Published Documents


TOTAL DOCUMENTS

164
(FIVE YEARS 21)

H-INDEX

32
(FIVE YEARS 1)

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 441
Author(s):  
Megan H. Murray ◽  
Jeffrey D. Blume

False discovery rates (FDR) are an essential component of statistical inference, representing the propensity for an observed result to be mistaken. FDR estimates should accompany observed results to help the user contextualize the relevance and potential impact of findings. This paper introduces a new user-friendly R pack-age for estimating FDRs and computing adjusted p-values for FDR control. The roles of these two quantities are often confused in practice and some software packages even report the adjusted p-values as the estimated FDRs. A key contribution of this package is that it distinguishes between these two quantities while also offering a broad array of refined algorithms for estimating them. For example, included are newly augmented methods for estimating the null proportion of findings - an important part of the FDR estimation procedure. The package is broad, encompassing a variety of adjustment methods for FDR estimation and FDR control, and includes plotting functions for easy display of results. Through extensive illustrations, we strongly encourage wider reporting of false discovery rates for observed findings.


Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

AbstractUntargeted metabolomics experiments rely on spectral libraries for structure annotation, but, typically, only a small fraction of spectra can be matched. Previous in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. Here we introduce the COSMIC workflow that combines in silico structure database generation and annotation with a confidence score consisting of kernel density P value estimation and a support vector machine with enforced directionality of features. On diverse datasets, COSMIC annotates a substantial number of hits at low false discovery rates and outperforms spectral library search. To demonstrate that COSMIC can annotate structures never reported before, we annotated 12 natural bile acids. The annotation of nine structures was confirmed by manual evaluation and two structures using synthetic standards. In human samples, we annotated and manually validated 315 molecular structures currently absent from the Human Metabolome Database. Application of COSMIC to data from 17,400 metabolomics experiments led to 1,715 high-confidence structural annotations that were absent from spectral libraries.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Sangjeong Lee ◽  
Heejin Park ◽  
Hyunwoo Kim

Abstract Background The target-decoy strategy effectively estimates the false-discovery rate (FDR) by creating a decoy database with a size identical to that of the target database. Decoy databases are created by various methods, such as, the reverse, pseudo-reverse, shuffle, pseudo-shuffle, and the de Bruijn methods. FDR is sometimes over- or under-estimated depending on which decoy database is used because the ratios of redundant peptides in the target databases are different, that is, the numbers of unique (non-redundancy) peptides in the target and decoy databases differ. Results We used two protein databases (the UniProt Saccharomyces cerevisiae protein database and the UniProt human protein database) to compare the FDRs of various decoy databases. When the ratio of redundant peptides in the target database is low, the FDR is not overestimated by any decoy construction method. However, if the ratio of redundant peptides in the target database is high, the FDR is overestimated when the (pseudo) shuffle decoy database is used. Additionally, human and S. cerevisiae six frame translation databases, which are large databases, also showed outcomes similar to that from the UniProt human protein database. Conclusion The FDR must be estimated using the correction factor proposed by Elias and Gygi or that by Kim et al. when (pseudo) shuffle decoy databases are used.


2021 ◽  
Author(s):  
Jiayi Zhang ◽  
Gang Wu ◽  
Hailong Zhu ◽  
Fengyuan Yang ◽  
Shuman Yang ◽  
...  

Abstract Background: The existing epidemiologic studies on the association between carnitine and breast cancer development are scarce. This study examined the association between circulating carnitine levels and breast cancer in females.Methods: This 1:1 age-matched case-control study identified 991 female breast cancer cases and 991 female controls without breast cancer. All cases and controls were confirmed with a pathological test. We measured 16 types of whole blood carnitine levels, such as free carnitine (C0) and octadecanoylcarnitine (C18), using targeted metabolomic technology. Results: The average age for cases and controls were 50.0 years (SD: 8.7 years) and 49.5 years (SD: 8.7 years), respectively. After adjusting for covariates, each SD increase in malonylcarnitine (C3DC; OR 0.91; 95% CI 0.83-1.00), decenoylcarnitine (C10:1; OR 0.87; 95% CI 0.79-0.96) and decadienoylcarnitine (C10:2; OR 0.90; 95% CI 0.82-0.99) level was associated with decreased odds of breast cancer. However, higher butyrylcarnitine (C4) levels were associated with increased risk of breast cancer (OR 1.12; 95% CI 1.02-1.23). We observed no relationship between other carnitines with breast cancer. The false discovery rates for C3DC, C4, C10:1 and C10:2 were 0.172, 0.120, 0.064 and 0.139, respectively. Conclusions: Higher levels of C3DC, C10:1, and C10:2 were protective factors for breast cancer, whereas increased C4 levels were a risk factor for breast cancer.


Author(s):  
Yumei Li ◽  
Xinzhou Ge ◽  
Fanglue Peng ◽  
Wei Li ◽  
Jingyi Jessica Li

AbstractWe report a surprising phenomenon about identifying differentially expressed genes (DEGs) from population-level RNA-seq data: two popular bioinformatics methods, DESeq2 and edgeR, have unexpectedly high false discovery rates (FDRs). Via permutation analysis on an immunotherapy RNA-seq dataset, we observed that DESeq2 and edgeR identified even more DEGs after samples’ condition labels were randomly permuted. Motivated by this, we evaluated six DEG identification methods (DESeq2, edgeR, limma-voom, NOISeq, dearseq, and the Wilcoxon rank-sum test) on population-level RNA-seq datasets. We found that the FDR control was often failed by the three popular parametric methods—DESeq2, edgeR, and limma-voom— and the new non-parametric method dearseq. In particular, the actual FDRs of DESeq2 and edgeR sometimes exceeded 20% when the target FDR threshold was only 5%. Although NOISeq, a non-parametric method used by GTEx, controlled the FDR better than the other four methods did, its power was much lower than that of the Wilcoxon rank-sum test, a classic nonparametric test that consistently controlled the FDR and achieved good power in our evaluation. Based on these results, for population-level RNA-seq studies, we recommend the Wilcoxon rank-sum test.


2021 ◽  
Author(s):  
Sule Yimaz ◽  
Florian Busch ◽  
Nagarjuna Nagaraj ◽  
Juergen Cox

Cross-linking combined with mass spectrometry (XL-MS) provides a wealth of information about the 3D structure of proteins and their interactions. We introduce MaxLynx, a novel computational proteomics workflow for XL-MS integrated into the MaxQuant environment. It is applicable to non-cleavable and MS-cleavable cross linkers. For both we have generalized the Andromeda peptide database search engine to efficiently identify cross-linked peptides. For non-cleavable peptides, we implemented a novel di-peptide Andromeda score, which is the basis for a computationally efficient N-squared search engine. Additionally, partial scores summarize the evidence for the two constituents of the di-peptide individually. A posterior error probability based on total and partial scores is used to control false discovery rates. For MS-cleavable cross linkers a scoring of signature peaks is combined with the conventional Andromeda score on the cleavage products. The MaxQuant 3D-peak detection was improved to ensure more accurate determination of the monoisotopic peak of isotope patterns for heavy molecules, which cross-linked peptides typically are. A wide selection of filtering parameters can replace manual filtering of identifications, which is often necessary when using other pipelines. On benchmark datasets of synthetic peptides, MaxLynx outperforms all other tested software on data for both types of cross linkers as well as on a proteome-wide dataset of cross-linked D. melanogaster cell lysate. The workflow also supports ion-mobility enhanced MS data. MaxLynx runs on Windows and Linux, contains an interactive viewer for displaying annotated cross-linked spectra and is freely available at https://www.maxquant.org/.


2021 ◽  
Vol 9 (8) ◽  
pp. 1560
Author(s):  
Ikuko Yuyama ◽  
Naoto Ugawa ◽  
Tetsuo Hashimoto

To detect the change during coral–dinoflagellate endosymbiosis establishment, we compared transcriptome data derived from free-living and symbiotic Durusdinium, a coral symbiont genus. We detected differentially expressed genes (DEGs) using two statistical methods (edgeR using raw read data and the Student’s t-test using bootstrap resampling read data) and detected 1214 DEGs between the symbiotic and free-living states, which we subjected to gene ontology (GO) analysis. Based on the representative GO terms and 50 DEGs with low false discovery rates, changes in Durusdinium during endosymbiosis were predicted. The expression of genes related to heat-shock proteins and microtubule-related proteins tended to decrease, and those of photosynthesis genes tended to increase. In addition, a phylogenetic analysis of dapdiamide A (antibiotics) synthase, which was upregulated among the 50 DEGs, confirmed that two genera in the Symbiodiniaceae family, Durusdinium and Symbiodinium, retain dapdiamide A synthase. This antibiotic synthase-related gene may contribute to the high stress tolerance documented in Durusdinium species, and its increased expression during endosymbiosis suggests increased antibacterial activity within the symbiotic complex.


2021 ◽  
Author(s):  
Pengyu Ni ◽  
Zhengchang Su

Predicting cis-regulatory modules(CRMs) in a genome and predicting their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to achieve both simultaneously using epigenetic data. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for accurately predicting CRMs in a genome by integrating numerous transcription factor ChIP-seq datasets. Here, we showed that only three or four epigenetic marks data in a cell/tissue type were sufficient for a machine-learning model to accurately predict functional states of all CRMs. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on different cell/tissue types in a mammal can accurately predict functional states of CRMs in different cell/tissue types of the mammal as well as in various cell/tissue types of a different mammal. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in mammals. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
James A Watson ◽  
Carolyne M Ndila ◽  
Sophie Uyoga ◽  
Alexander Macharia ◽  
Gideon Nyutu ◽  
...  

Severe falciparum malaria has substantially affected human evolution. Genetic association studies of patients with clinically defined severe malaria and matched population controls have helped characterise human genetic susceptibility to severe malaria, but phenotypic imprecision compromises discovered associations. In areas of high malaria transmission the diagnosis of severe malaria in young children and, in particular, the distinction from bacterial sepsis, is imprecise. We developed a probabilistic diagnostic model of severe malaria using platelet and white count data. Under this model we re-analysed clinical and genetic data from 2,220 Kenyan children with clinically defined severe malaria and 3,940 population controls, adjusting for phenotype mis-labelling. Our model, validated by the distribution of sickle trait, estimated that approximately one third of cases did not have severe malaria. We propose a data-tilting approach for case-control studies with phenotype mis-labelling and show that this reduces false discovery rates and improves statistical power in genome-wide association studies.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Swantje Lenz ◽  
Ludwig R. Sinn ◽  
Francis J. O’Reilly ◽  
Lutz Fischer ◽  
Fritz Wegner ◽  
...  

AbstractProtein-protein interactions govern most cellular pathways and processes, and multiple technologies have emerged to systematically map them. Assessing the error of interaction networks has been a challenge. Crosslinking mass spectrometry is currently widening its scope from structural analyses of purified multi-protein complexes towards systems-wide analyses of protein-protein interactions (PPIs). Using a carefully controlled large-scale analysis of Escherichia coli cell lysate, we demonstrate that false-discovery rates (FDR) for PPIs identified by crosslinking mass spectrometry can be reliably estimated. We present an interaction network comprising 590 PPIs at 1% decoy-based PPI-FDR. The structural information included in this network localises the binding site of the hitherto uncharacterised protein YacL to near the DNA exit tunnel on the RNA polymerase.


Sign in / Sign up

Export Citation Format

Share Document