Quantifying Suspiciousness within correlated data sets

Pablo Lemos; Fabian Köhlinger; Will Handley; Benjamin Joachimi; Lorne Whiteway; Ofer Lahav

doi:10.1093/mnras/staa1836

Comparison of SMMR and SSM/I passive microwave data collected over Antarctica

Annals of Glaciology ◽

10.3189/s0260305500012726 ◽

1993 ◽

Vol 17 ◽

pp. 131-136 ◽

Cited By ~ 8

Author(s):

Kenneth C. Jezek ◽

Carolyn J. Merry ◽

Don J. Cavalieri

Keyword(s):

Brightness Temperature ◽

Microwave Radiometer ◽

Correlated Data ◽

Data Sets ◽

Microwave Brightness Temperature ◽

Brightness Temperatures ◽

Scatter Plots ◽

Microwave Imager ◽

Highly Correlated ◽

The Antarctic

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.

Download Full-text

Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level

mSystems ◽

10.1128/msystems.00943-20 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Gongchao Jing ◽

Lu Liu ◽

Zengbin Wang ◽

Yufeng Zhang ◽

Li Qian ◽

...

Keyword(s):

Big Data ◽

User Interface ◽

Search Engine ◽

Functional Similarity ◽

Metagenomic Data ◽

Data Sets ◽

Data Space ◽

Link Type ◽

Database Platform ◽

Microbiome Data

ABSTRACT Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.

Download Full-text

A Bayesian framework for emergent constraints: case studies of climate sensitivity with PMIP

10.5194/egusphere-egu2020-3046 ◽

2020 ◽

Author(s):

Martin Renoult ◽

James Annan ◽

Julia Hargreaves ◽

Navjit Sagoo ◽

Clare Flynn ◽

...

Keyword(s):

Climate Sensitivity ◽

Bayesian Method ◽

Ordinary Least Squares ◽

Bayesian Framework ◽

Future Climate Change ◽

Filter Method ◽

Data Sets ◽

Uncertainty Estimates ◽

Emergent Constraints ◽

Model Ensembles

<p>In this study we introduce a Bayesian framework, which is flexible and explicit about the prior assumptions, for using model ensembles and observations together to constrain future climate change. The emergent constraint approach has seen broad application in recent years, including studies constraining the equilibrium climate sensitivity (ECS) using the Last Glacial Maximum (LGM) and the mid-Pliocene Warm Period (mPWP). Most of these studies were based on Ordinary Least Squares (OLS) fits between a variable of the climate state, such as tropical temperature, and climate sensitivity. Using our Bayesian method, and considering the LGM and mPWP separately, we obtain values of ECS of 2.7 K (1.1 - 4.8, 5 - 95 percentiles) using the PMIP2, PMIP3 and PMIP4 data sets for the LGM, and 2.4 K (0.4 - 5.0) with the PlioMIP1 and PlioMIP2 data sets for the mPWP. Restricting the ensembles to include only the most recent version of each model, we obtain 2.7 K (1.1 - 4.3) using the LGM and &#160;2.4 K (0.4 - 5.1) using the mPWP. An advantage of the Bayesian framework is that it is possible to combine the two periods assuming they are independent, whereby we obtain a slightly tighter constraint of 2.6 K (1.1 - 3.9). We have explored the sensitivity to our assumptions in the method, including considering structural uncertainty, and in the choice of models, and this leads to 95% probability of climate sensitivity mostly below 5 and never exceeding 6 K. The approach is compared with other approaches based on OLS, a Kalman filter method and an alternative Bayesian method. An interesting implication of this work is that OLS-based emergent constraints on ECS generate tighter uncertainty estimates, in particular at the lower end, suggesting a higher bound by construction in case of weaker correlation. Although some fundamental challenges related to the use of emergent constraints remain, this paper provides a step towards a better foundation of their potential use in future probabilistic estimation of climate sensitivity.</p>

Download Full-text

Analytic Calculation of Covariance between Cosmological Parameters from Correlated Data Sets, with an Application to SPTpol

The Astrophysical Journal ◽

10.3847/1538-4357/ab54cc ◽

2019 ◽

Vol 888 (1) ◽

pp. 26 ◽

Cited By ~ 1

Author(s):

Joshua A. Kable ◽

Graeme E. Addison ◽

Charles L. Bennett

Keyword(s):

Cosmological Parameters ◽

Correlated Data ◽

Analytic Calculation ◽

Data Sets

Download Full-text

Cosmology from large-scale structure

Astronomy and Astrophysics ◽

10.1051/0004-6361/201936772 ◽

2020 ◽

Vol 633 ◽

pp. L10 ◽

Cited By ~ 15

Author(s):

Tilman Tröster ◽

Ariel. G. Sánchez ◽

Marika Asgari ◽

Chris Blake ◽

Martín Crocce ◽

...

Keyword(s):

Cosmic Microwave Background ◽

Parameter Space ◽

Large Scale ◽

Bayes Factor ◽

Large Scale Structure ◽

Data Sets ◽

Microwave Background ◽

Shape Information ◽

Galaxy Clustering ◽

Cosmological Constraints

We reanalyse the anisotropic galaxy clustering measurement from the Baryon Oscillation Spectroscopic Survey (BOSS), demonstrating that using the full shape information provides cosmological constraints that are comparable to other low-redshift probes. We find Ωm = 0.317+0.015−0.019, σ8 = 0.710±0.049, and h = 0.704 ± 0.024 for flat ΛCDM cosmologies using uninformative priors on Ωch2, 100θMC, ln1010As, and ns, and a prior on Ωbh2 that is much wider than current constraints. We quantify the agreement between the Planck 2018 constraints from the cosmic microwave background and BOSS, finding the two data sets to be consistent within a flat ΛCDM cosmology using the Bayes factor as well as the prior-insensitive suspiciousness statistic. Combining two low-redshift probes, we jointly analyse the clustering of BOSS galaxies with weak lensing measurements from the Kilo-Degree Survey (KV450). The combination of BOSS and KV450 improves the measurement by up to 45%, constraining σ8 = 0.702 ± 0.029 and S8 = σ8 Ωm/0.3 = 0.728 ± 0.026. Over the full 5D parameter space, the odds in favour of a single cosmology describing galaxy clustering, lensing, and the cosmic microwave background are 7 ± 2. The suspiciousness statistic signals a 2.1 ± 0.3σ tension between the combined low-redshift probes and measurements from the cosmic microwave background.

Download Full-text

Time Series Control Charts for Correlated and Contaminated Data

Journal of Engineering for Industry ◽

10.1115/1.3187067 ◽

1986 ◽

Vol 108 (3) ◽

pp. 219-226 ◽

Cited By ~ 22

Author(s):

B. D. Notohardjono ◽

D. S. Ermer

Keyword(s):

Time Series ◽

Control Charts ◽

Maximum Principal Stress ◽

Correlated Data ◽

Model Parameters ◽

Data Sets ◽

Process Change ◽

Contaminated Data ◽

Model Residuals ◽

Control Limits

This paper discusses the development of control charts for correlated and contaminated data. For illustration the charts were applied to a set of maximum principal-stress data at two locations on a blast furnace shell. The Dynamic Data System (DDS) approach was used to model the correlated data which contained several types of discrepancies. After the standard DDS models were found, control charts for the averages and variances of the model residuals were constructed for two data sets. For more effective analysis, two methods for calculating the control limits for both charts are given. With this approach, dynamic process change, such as an increase in the production rate or the wearing out of the sacrificial lining, can be detected and separated from data with collection errors from instrument malfunctions. Furthermore, the tap hole opening timing is identified from the DDS model parameters, to help verify the time series model.

Download Full-text

A Graphic Approach to Summarizing Forestry Data

The Forestry Chronicle ◽

10.5558/tfc63347-5 ◽

1987 ◽

Vol 63 (5) ◽

pp. 347-350 ◽

Cited By ~ 2

Author(s):

Stephen J. Titus

Keyword(s):

Forest Inventory ◽

Data Distribution ◽

Data Sets ◽

Summary Statistics ◽

Tree Diameter ◽

Volume Function ◽

Mean And Variance ◽

The Mean ◽

Diameter Distributions

Boxplots are a useful enhancement to the traditional summary statistics such as the mean and variance. Based on the median and other percentiles of the data distribution, they provide more information in a graphic format which is convenient for interpreting the nature of one or several data sets Use of boxplots is illustrated with three common types of forestry data: 1) tree diameter distributions, 2) tree volume function residuals, and 3) forest inventory summaries.

Download Full-text

Comparison of SMMR and SSM/I passive microwave data collected over Antarctica

Annals of Glaciology ◽

10.1017/s0260305500012726 ◽

1993 ◽

Vol 17 ◽

pp. 131-136 ◽

Cited By ~ 32

Author(s):

Kenneth C. Jezek ◽

Carolyn J. Merry ◽

Don J. Cavalieri

Keyword(s):

Brightness Temperature ◽

Microwave Radiometer ◽

Correlated Data ◽

Data Sets ◽

Microwave Brightness Temperature ◽

Brightness Temperatures ◽

Scatter Plots ◽

Microwave Imager ◽

Highly Correlated ◽

The Antarctic

Spaceborne data are becoming sufficiently extensive spatially and sufficiently lengthy over time to provide important gauges of global change. There is a potentially long record of microwave brightness temperature from NASA's Scanning Multichannel Microwave Radiometer (SMMR), followed by the Navy's Special Sensor Microwave Imager (SSM/I). Thus it is natural to combine data from successive satellite programs into a single, long record. To do this, we compare brightness temperature data collected during the brief overlap period (7 July-20 August 1987) of SMMR and SSM/I. Only data collected over the Antarctic ice sheet are used to limit spatial and temporal complications associated with the open ocean and sea ice. Linear regressions are computed from scatter plots of complementary pairs of channels from each sensor revealing highly correlated data sets, supporting the argument that there are important relative calibration differences between the two instruments. The calibration scheme was applied to a set of average monthly brightness temperatures for a sector of East Antarctica.

Download Full-text

An efficient data‐subspace inversion method for 2-D magnetotelluric data

Geophysics ◽

10.1190/1.1444778 ◽

2000 ◽

Vol 65 (3) ◽

pp. 791-803 ◽

Cited By ~ 262

Author(s):

Weerachai Siripunvaraporn ◽

Gary Egbert

Keyword(s):

Significant Loss ◽

Inversion Method ◽

Data Sets ◽

Reduced Basis ◽

Sensitivity Matrix ◽

Approximate Methods ◽

Magnetotelluric Data ◽

Data Set ◽

Data Space ◽

Occam’S Inversion

There are currently three types of algorithms in use for regularized 2-D inversion of magnetotelluric (MT) data. All seek to minimize some functional which penalizes data misfit and model structure. With the most straight‐forward approach (exemplified by OCCAM), the minimization is accomplished using some variant on a linearized Gauss‐Newton approach. A second approach is to use a descent method [e.g., nonlinear conjugate gradients (NLCG)] to avoid the expense of constructing large matrices (e.g., the sensitivity matrix). Finally, approximate methods [e.g., rapid relaxation inversion (RRI)] have been developed which use cheaply computed approximations to the sensitivity matrix to search for a minimum of the penalty functional. Approximate approaches can be very fast, but in practice often fail to converge without significant expert user intervention. On the other hand, the more straightforward methods can be prohibitively expensive to use for even moderate‐size data sets. Here, we present a new and much more efficient variant on the OCCAM scheme. By expressing the solution as a linear combination of rows of the sensitivity matrix smoothed by the model covariance (the “representers”), we transform the linearized inverse problem from the M-dimensional model space to the N-dimensional data space. This method is referred to as DASOCC, the data space OCCAM’s inversion. Since generally N ≪ M, this transformation by itself can result in significant computational saving. More importantly the data space formulation suggests a simple approximate method for constructing the inverse solution. Since MT data are smooth and “redundant,” a subset of the representers is typically sufficient to form the model without significant loss of detail. Computations required for constructing sensitivities and the size of matrices to be inverted can be significantly reduced by this approximation. We refer to this inversion as REBOCC, the reduced basis OCCAM’s inversion. Numerical experiments on synthetic and real data sets with REBOCC, DASOCC, NLCG, RRI, and OCCAM show that REBOCC is faster than both DASOCC and NLCG, which are comparable in speed. All of these methods are significantly faster than OCCAM, but are not competitive with RRI. However, even with a simple synthetic data set, we could not always get RRI to converge to a reasonable solution. The basic idea behind REBOCC should be more broadly applicable, in particular to 3-D MT inversion.

Download Full-text

On node selection for classification in correlated data sets

2008 42nd Annual Conference on Information Sciences and Systems ◽

10.1109/ciss.2008.4558676 ◽

2008 ◽

Author(s):

Razvan Cristescu

Keyword(s):

Correlated Data ◽

Data Sets ◽

Node Selection ◽

Selection For

Download Full-text