scholarly journals Quantifying Suspiciousness within correlated data sets

2020 ◽  
Vol 496 (4) ◽  
pp. 4647-4653 ◽  
Author(s):  
Pablo Lemos ◽  
Fabian Köhlinger ◽  
Will Handley ◽  
Benjamin Joachimi ◽  
Lorne Whiteway ◽  
...  

ABSTRACT We propose a principled Bayesian method for quantifying tension between correlated data sets with wide uninformative parameter priors. This is achieved by extending the Suspiciousness statistic, which is insensitive to priors. Our method uses global summary statistics, and as such it can be used as a diagnostic for internal consistency. We show how our approach can be combined with methods that use parameter space and data space to identify the existing internal discrepancies. As an example, we use it to test the internal consistency of the KiDS-450 data in four photometric redshift bins, and to recover controlled internal discrepancies in simulated KiDS data. We propose this as a diagnostic of internal consistency for present and future cosmological surveys, and as a tension metric for data sets that have non-negligible correlation, such as Large Synoptic Spectroscopic Survey and Euclid.

1993 ◽  
Vol 17 ◽  
pp. 131-136 ◽  
Author(s):  
Kenneth C. Jezek ◽  
Carolyn J. Merry ◽  
Don J. Cavalieri

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.


mSystems ◽  
2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Gongchao Jing ◽  
Lu Liu ◽  
Zengbin Wang ◽  
Yufeng Zhang ◽  
Li Qian ◽  
...  

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.


2020 ◽  
Author(s):  
Martin Renoult ◽  
James Annan ◽  
Julia Hargreaves ◽  
Navjit Sagoo ◽  
Clare Flynn ◽  
...  

&lt;p&gt;In this study we introduce a Bayesian framework, which is flexible and explicit about the prior assumptions, for using model ensembles and observations together to constrain future climate change. The emergent constraint approach has seen broad application in recent years, including studies constraining the equilibrium climate sensitivity (ECS) using the Last Glacial Maximum (LGM) and the mid-Pliocene Warm Period (mPWP). Most of these studies were based on Ordinary Least Squares (OLS) fits between a variable of the climate state, such as tropical temperature, and climate sensitivity. Using our Bayesian method, and considering the LGM and mPWP separately, we obtain values of ECS of 2.7 K (1.1 - 4.8, 5 - 95 percentiles) using the PMIP2, PMIP3 and PMIP4 data sets for the LGM, and 2.4 K (0.4 - 5.0) with the PlioMIP1 and PlioMIP2 data sets for the mPWP. Restricting the ensembles to include only the most recent version of each model, we obtain 2.7 K (1.1 - 4.3) using the LGM and &amp;#160;2.4 K (0.4 - 5.1) using the mPWP. An advantage of the Bayesian framework is that it is possible to combine the two periods assuming they are independent, whereby we obtain a slightly tighter constraint of 2.6 K (1.1 - 3.9). We have explored the sensitivity to our assumptions in the method, including considering structural uncertainty, and in the choice of models, and this leads to 95% probability of climate sensitivity mostly below 5 and never exceeding 6 K. The approach is compared with other approaches based on OLS, a Kalman filter method and an alternative Bayesian method. An interesting implication of this work is that OLS-based emergent constraints on ECS generate tighter uncertainty estimates, in particular at the lower end, suggesting a higher bound by construction in case of weaker correlation. Although some fundamental challenges related to the use of emergent constraints remain, this paper provides a step towards a better foundation of their potential use in future probabilistic estimation of climate sensitivity.&lt;/p&gt;


2020 ◽  
Vol 633 ◽  
pp. L10 ◽  
Author(s):  
Tilman Tröster ◽  
Ariel. G. Sánchez ◽  
Marika Asgari ◽  
Chris Blake ◽  
Martín Crocce ◽  
...  

We reanalyse the anisotropic galaxy clustering measurement from the Baryon Oscillation Spectroscopic Survey (BOSS), demonstrating that using the full shape information provides cosmological constraints that are comparable to other low-redshift probes. We find Ωm = 0.317+0.015−0.019, σ8 = 0.710±0.049, and h = 0.704 ± 0.024 for flat ΛCDM cosmologies using uninformative priors on Ωch2, 100θMC, ln1010As, and ns, and a prior on Ωbh2 that is much wider than current constraints. We quantify the agreement between the Planck 2018 constraints from the cosmic microwave background and BOSS, finding the two data sets to be consistent within a flat ΛCDM cosmology using the Bayes factor as well as the prior-insensitive suspiciousness statistic. Combining two low-redshift probes, we jointly analyse the clustering of BOSS galaxies with weak lensing measurements from the Kilo-Degree Survey (KV450). The combination of BOSS and KV450 improves the measurement by up to 45%, constraining σ8 = 0.702 ± 0.029 and S8 = σ8 Ωm/0.3 = 0.728 ± 0.026. Over the full 5D parameter space, the odds in favour of a single cosmology describing galaxy clustering, lensing, and the cosmic microwave background are 7 ± 2. The suspiciousness statistic signals a 2.1 ± 0.3σ tension between the combined low-redshift probes and measurements from the cosmic microwave background.


1986 ◽  
Vol 108 (3) ◽  
pp. 219-226 ◽  
Author(s):  
B. D. Notohardjono ◽  
D. S. Ermer

This paper discusses the development of control charts for correlated and contaminated data. For illustration the charts were applied to a set of maximum principal-stress data at two locations on a blast furnace shell. The Dynamic Data System (DDS) approach was used to model the correlated data which contained several types of discrepancies. After the standard DDS models were found, control charts for the averages and variances of the model residuals were constructed for two data sets. For more effective analysis, two methods for calculating the control limits for both charts are given. With this approach, dynamic process change, such as an increase in the production rate or the wearing out of the sacrificial lining, can be detected and separated from data with collection errors from instrument malfunctions. Furthermore, the tap hole opening timing is identified from the DDS model parameters, to help verify the time series model.


1987 ◽  
Vol 63 (5) ◽  
pp. 347-350 ◽  
Author(s):  
Stephen J. Titus

Boxplots are a useful enhancement to the traditional summary statistics such as the mean and variance. Based on the median and other percentiles of the data distribution, they provide more information in a graphic format which is convenient for interpreting the nature of one or several data sets Use of boxplots is illustrated with three common types of forestry data: 1) tree diameter distributions, 2) tree volume function residuals, and 3) forest inventory summaries.


1993 ◽  
Vol 17 ◽  
pp. 131-136 ◽  
Author(s):  
Kenneth C. Jezek ◽  
Carolyn J. Merry ◽  
Don J. Cavalieri

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.


Geophysics ◽  
2000 ◽  
Vol 65 (3) ◽  
pp. 791-803 ◽  
Author(s):  
Weerachai Siripunvaraporn ◽  
Gary Egbert

There are currently three types of algorithms in use for regularized 2-D inversion of magnetotelluric (MT) data. All seek to minimize some functional which penalizes data misfit and model structure. With the most straight‐forward approach (exemplified by OCCAM), the minimization is accomplished using some variant on a linearized Gauss‐Newton approach. A second approach is to use a descent method [e.g., nonlinear conjugate gradients (NLCG)] to avoid the expense of constructing large matrices (e.g., the sensitivity matrix). Finally, approximate methods [e.g., rapid relaxation inversion (RRI)] have been developed which use cheaply computed approximations to the sensitivity matrix to search for a minimum of the penalty functional. Approximate approaches can be very fast, but in practice often fail to converge without significant expert user intervention. On the other hand, the more straightforward methods can be prohibitively expensive to use for even moderate‐size data sets. Here, we present a new and much more efficient variant on the OCCAM scheme. By expressing the solution as a linear combination of rows of the sensitivity matrix smoothed by the model covariance (the “representers”), we transform the linearized inverse problem from the M-dimensional model space to the N-dimensional data space. This method is referred to as DASOCC, the data space OCCAM’s inversion. Since generally N ≪ M, this transformation by itself can result in significant computational saving. More importantly the data space formulation suggests a simple approximate method for constructing the inverse solution. Since MT data are smooth and “redundant,” a subset of the representers is typically sufficient to form the model without significant loss of detail. Computations required for constructing sensitivities and the size of matrices to be inverted can be significantly reduced by this approximation. We refer to this inversion as REBOCC, the reduced basis OCCAM’s inversion. Numerical experiments on synthetic and real data sets with REBOCC, DASOCC, NLCG, RRI, and OCCAM show that REBOCC is faster than both DASOCC and NLCG, which are comparable in speed. All of these methods are significantly faster than OCCAM, but are not competitive with RRI. However, even with a simple synthetic data set, we could not always get RRI to converge to a reasonable solution. The basic idea behind REBOCC should be more broadly applicable, in particular to 3-D MT inversion.


Sign in / Sign up

Export Citation Format

Share Document