A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT)

Mapping Intimacies ◽

10.5194/essd-2016-15 ◽

2016 ◽

Cited By ~ 6

Author(s):

Dorothee C. E. Bakker ◽

Benjamin Pfeil ◽

Camilla S. Landa ◽

Nicolas Metzl ◽

Kevin M. O'Brien ◽

...

Keyword(s):

Carbon Dioxide ◽

Surface Water ◽

Data Collection ◽

Data Availability ◽

Data Sets ◽

Science Data ◽

High Quality ◽

Data Set ◽

Surface Ocean ◽

Biogeochemical Models

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.5 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.4 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "Living Data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014).

Download Full-text

A multi-decade record of high-quality fCO2 data in version 3 of the Surface Ocean CO2 Atlas (SOCAT)

Earth System Science Data ◽

10.5194/essd-8-383-2016 ◽

2016 ◽

Vol 8 (2) ◽

pp. 383-413 ◽

Cited By ~ 166

Author(s):

Dorothee C. E. Bakker ◽

Benjamin Pfeil ◽

Camilla S. Landa ◽

Nicolas Metzl ◽

Kevin M. O'Brien ◽

...

Keyword(s):

Carbon Dioxide ◽

Surface Water ◽

Data Collection ◽

Data Availability ◽

Synthesis Product ◽

Data Sets ◽

High Quality ◽

Data Set ◽

Surface Ocean ◽

Biogeochemical Models

Abstract. The Surface Ocean CO2 Atlas (SOCAT) is a synthesis of quality-controlled fCO2 (fugacity of carbon dioxide) values for the global surface oceans and coastal seas with regular updates. Version 3 of SOCAT has 14.7 million fCO2 values from 3646 data sets covering the years 1957 to 2014. This latest version has an additional 4.6 million fCO2 values relative to version 2 and extends the record from 2011 to 2014. Version 3 also significantly increases the data availability for 2005 to 2013. SOCAT has an average of approximately 1.2 million surface water fCO2 values per year for the years 2006 to 2012. Quality and documentation of the data has improved. A new feature is the data set quality control (QC) flag of E for data from alternative sensors and platforms. The accuracy of surface water fCO2 has been defined for all data set QC flags. Automated range checking has been carried out for all data sets during their upload into SOCAT. The upgrade of the interactive Data Set Viewer (previously known as the Cruise Data Viewer) allows better interrogation of the SOCAT data collection and rapid creation of high-quality figures for scientific presentations. Automated data upload has been launched for version 4 and will enable more frequent SOCAT releases in the future. High-profile scientific applications of SOCAT include quantification of the ocean sink for atmospheric carbon dioxide and its long-term variation, detection of ocean acidification, as well as evaluation of coupled-climate and ocean-only biogeochemical models. Users of SOCAT data products are urged to acknowledge the contribution of data providers, as stated in the SOCAT Fair Data Use Statement. This ESSD (Earth System Science Data) "living data" publication documents the methods and data sets used for the assembly of this new version of the SOCAT data collection and compares these with those used for earlier versions of the data collection (Pfeil et al., 2013; Sabine et al., 2013; Bakker et al., 2014). Individual data set files, included in the synthesis product, can be downloaded here: doi:10.1594/PANGAEA.849770. The gridded products are available here: doi:10.3334/CDIAC/OTG.SOCAT_V3_GRID.

Download Full-text

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

Advances in Methods and Practices in Psychological Science ◽

10.1177/2515245920928007 ◽

2021 ◽

Vol 4 (1) ◽

pp. 251524592092800

Author(s):

Erin M. Buchanan ◽

Sarah E. Crain ◽

Ari L. Cunningham ◽

Hannah R. Johnson ◽

Hannah Stash ◽

...

Keyword(s):

Data Collection ◽

Data Sharing ◽

Search Engine ◽

Web Applications ◽

Data Sets ◽

Data Dictionary ◽

Data Set ◽

Entire Process ◽

Shared Data ◽

Source Data

As researchers embrace open and transparent data sharing, they will need to provide information about their data that effectively helps others understand their data sets’ contents. Without proper documentation, data stored in online repositories such as OSF will often be rendered unfindable and unreadable by other researchers and indexing search engines. Data dictionaries and codebooks provide a wealth of information about variables, data collection, and other important facets of a data set. This information, called metadata, provides key insights into how the data might be further used in research and facilitates search-engine indexing to reach a broader audience of interested parties. This Tutorial first explains terminology and standards relevant to data dictionaries and codebooks. Accompanying information on OSF presents a guided workflow of the entire process from source data (e.g., survey answers on Qualtrics) to an openly shared data set accompanied by a data dictionary or codebook that follows an agreed-upon standard. Finally, we discuss freely available Web applications to assist this process of ensuring that psychology data are findable, accessible, interoperable, and reusable.

Download Full-text

From pole to pole: 33 years of physical oceanography onboard R/V Polarstern

Earth System Science Data ◽

10.5194/essd-9-211-2017 ◽

2017 ◽

Vol 9 (1) ◽

pp. 211-220 ◽

Cited By ~ 7

Author(s):

Amelie Driemel ◽

Eberhard Fahrbach ◽

Gerd Rohardt ◽

Agnieszka Beszczynska-Möller ◽

Antje Boetius ◽

...

Keyword(s):

Data Collection ◽

Water Cycle ◽

Heat Budget ◽

Ocean Dynamics ◽

The Arctic ◽

Sensor Calibration ◽

Data Sets ◽

Calibration Data ◽

Data Set ◽

Different Characteristics

Abstract. Measuring temperature and salinity profiles in the world's oceans is crucial to understanding ocean dynamics and its influence on the heat budget, the water cycle, the marine environment and on our climate. Since 1983 the German research vessel and icebreaker Polarstern has been the platform of numerous CTD (conductivity, temperature, depth instrument) deployments in the Arctic and the Antarctic. We report on a unique data collection spanning 33 years of polar CTD data. In total 131 data sets (1 data set per cruise leg) containing data from 10 063 CTD casts are now freely available at doi:10.1594/PANGAEA.860066. During this long period five CTD types with different characteristics and accuracies have been used. Therefore the instruments and processing procedures (sensor calibration, data validation, etc.) are described in detail. This compilation is special not only with regard to the quantity but also the quality of the data – the latter indicated for each data set using defined quality codes. The complete data collection includes a number of repeated sections for which the quality code can be used to investigate and evaluate long-term changes. Beginning with 2010, the salinity measurements presented here are of the highest quality possible in this field owing to the introduction of the OPTIMARE Precision Salinometer.

Download Full-text

Alternative Clustering

Advances in Business Information Systems and Analytics - Applying Predictive Analytics Within the Service Sector ◽

10.4018/978-1-5225-2148-8.ch001 ◽

2017 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Avinash Navlani ◽

V. B. Gupta

Keyword(s):

Research Problem ◽

Research Community ◽

Data Sets ◽

Complex Data ◽

High Quality ◽

Data Set ◽

Alternative Clustering ◽

Complex Data Sets ◽

Data Objects ◽

Community Clustering

In the last couple of decades, clustering has become a very crucial research problem in the data mining research community. Clustering refers to the partitioning of data objects such as records and documents into groups or clusters of similar characteristics. Clustering is unsupervised learning, because of unsupervised nature there is no unique solution for all problems. Most of the time complex data sets require explanation in multiple clustering sets. All the Traditional clustering approaches generate single clustering. There is more than one pattern in a dataset; each of patterns can be interesting in from different perspectives. Alternative clustering intends to find all unlike groupings of the data set such that each grouping has high quality and distinct from each other. This chapter gives you an overall view of alternative clustering; it's various approaches, related work, comparing with various confusing related terms like subspace, multi-view, and ensemble clustering, applications, issues, and challenges.

Download Full-text

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Sensors ◽

10.3390/s20030879 ◽

2020 ◽

Vol 20 (3) ◽

pp. 879 ◽

Cited By ~ 2

Author(s):

Uwe Köckemann ◽

Marjan Alirezaie ◽

Jennifer Renoux ◽

Nicolas Tsiftes ◽

Mobyen Uddin Ahmed ◽

...

Keyword(s):

Data Collection ◽

Activity Recognition ◽

Care Home ◽

Open Data ◽

Ground Truth ◽

Smart Homes ◽

Sensor Data ◽

Data Sets ◽

Data Set ◽

Home Setting

As research in smart homes and activity recognition is increasing, it is of ever increasing importance to have benchmarks systems and data upon which researchers can compare methods. While synthetic data can be useful for certain method developments, real data sets that are open and shared are equally as important. This paper presents the E-care@home system, its installation in a real home setting, and a series of data sets that were collected using the E-care@home system. Our first contribution, the E-care@home system, is a collection of software modules for data collection, labeling, and various reasoning tasks such as activity recognition, person counting, and configuration planning. It supports a heterogeneous set of sensors that can be extended easily and connects collected sensor data to higher-level Artificial Intelligence (AI) reasoning modules. Our second contribution is a series of open data sets which can be used to recognize activities of daily living. In addition to these data sets, we describe the technical infrastructure that we have developed to collect the data and the physical environment. Each data set is annotated with ground-truth information, making it relevant for researchers interested in benchmarking different algorithms for activity recognition.

Download Full-text

Limitations on digital filtering of the DNAG magnetic data set for the conterminous U.S.

Geophysics ◽

10.1190/1.1443511 ◽

1993 ◽

Vol 58 (9) ◽

pp. 1281-1296 ◽

Cited By ~ 12

Author(s):

V. J. S. Grauch

Keyword(s):

Spatial Frequency ◽

North American ◽

Digital Filtering ◽

Magnetic Data ◽

Data Sets ◽

Frequency Content ◽

Long Distance ◽

High Quality ◽

Data Set ◽

The U.S

The magnetic data set compiled for the Decade of North American Geology (DNAG) project presents an important digital data base that can be used to examine the North American crust. The data represent a patchwork from many individual airborne and marine magnetic surveys. However, the portion of data for the conterminous U.S. has problems that limit the resolution and use of the data. Now that the data are available in digital form, it is important to describe the data limitations more specifically than before. The primary problem is caused by datum shifts between individual survey boundaries. In the western U.S., the DNAG data are generally shifted less than 100 nT. In the eastern U.S., the DNAG data may be shifted by as much as 300 nT and contain regionally shifted areas with wavelengths on the order of 800 to 1400 km. The worst case is the artificial low centered over Kentucky and Tennessee produced by a series of datum shifts. A second significant problem is lack of anomaly resolution that arises primarily from using survey data that is too widely spaced compared to the flight heights above magnetic sources. Unfortunately, these are the only data available for much of the U.S. Another problem is produced by the lack of common observation surface between individual pieces of the U.S. DNAG data. The height disparities introduce variations in spatial frequency content that are unrelated to the magnetization of rocks. The spectral effects of datum shifts and the variation of spatial frequency content due to height disparities were estimated for the DNAG data for the conterminous U.S. As a general guideline for digital filtering, the most reliable features in the U.S. DNAG data have wavelengths roughly between 170 and 500 km, or anomaly half‐widths between 85 and 250 km. High‐quality, large‐region magnetic data sets have become increasingly important to meet exploration and scientific objectives. The acquisition of a new national magnetic data set with higher quality at a greater range of wavelengths is clearly in order. The best approach is to refly much of the U.S. with common specifications and reduction procedures. At the very least, magnetic data sets should be remerged digitally using available or newly flown long‐distance flight‐line data to adjust survey levels. In any case, national coordination is required to produce a consistent, high‐quality national magnetic map.

Download Full-text

A uniform, quality controlled Surface Ocean CO2 Atlas (SOCAT)

Earth System Science Data Discussions ◽

10.5194/essdd-5-735-2012 ◽

2012 ◽

Vol 5 (2) ◽

pp. 735-780 ◽

Cited By ~ 24

Author(s):

B. Pfeil ◽

A. Olsen ◽

D. C. E. Bakker ◽

S. Hankin ◽

H. Koyuk ◽

...

Keyword(s):

Carbon Dioxide ◽

Quality Control ◽

Data Access ◽

Rapid Expansion ◽

Co2 Uptake ◽

Data Set ◽

Global Team ◽

Surface Ocean ◽

Carbon Dioxide Co2 ◽

Global Data

Abstract. A well documented, publicly available, global data set of surface ocean carbon dioxide (CO2) parameters has been called for by international groups for nearly two decades. The Surface Ocean CO2 Atlas (SOCAT) project was initiated by the international marine carbon science community in 2007 with the aim of providing a comprehensive, publicly available, regularly updated, global data set of marine surface CO2, which had been subject to quality control (QC). Many additional CO2 data, not yet made public via the Carbon Dioxide Information Analysis Center (CDIAC), were retrieved from data originators, public websites and other data centres. All data were put in a uniform format following a strict protocol. Quality control was carried out according to clearly defined criteria. Regional specialists performed the quality control, using state-of-the-art web-based tools, specially developed for accomplishing this global team effort. SOCAT version 1.5 was made public in September 2011 and holds 6.3 million quality controlled surface CO2 data points from the global oceans and coastal seas, spanning four decades (1968–2007). Three types of data products are available: individual cruise files, a merged complete data set and gridded products. With the rapid expansion of marine CO2 data collection and the importance of quantifying net global oceanic CO2 uptake and its changes, sustained data synthesis and data access are priorities.

Download Full-text

Research on improved privacy publishing algorithm based on set cover

Computer Science and Information Systems ◽

10.2298/csis180915023l ◽

2019 ◽

Vol 16 (3) ◽

pp. 705-731

Author(s):

Haoze Lv ◽

Zhaobin Liu ◽

Zhonglian Hu ◽

Lihai Nie ◽

Weijiang Liu ◽

...

Keyword(s):

Privacy Protection ◽

Data Privacy ◽

Differential Privacy ◽

Main Idea ◽

Data Availability ◽

Set Cover ◽

Data Sets ◽

Data Set ◽

Query Cover ◽

Privacy Model

With the invention of big data era, data releasing is becoming a hot topic in database community. Meanwhile, data privacy also raises the attention of users. As far as the privacy protection models that have been proposed, the differential privacy model is widely utilized because of its many advantages over other models. However, for the private releasing of multi-dimensional data sets, the existing algorithms are publishing data usually with low availability. The reason is that the noise in the released data is rapidly grown as the increasing of the dimensions. In view of this issue, we propose algorithms based on regular and irregular marginal tables of frequent item sets to protect privacy and promote availability. The main idea is to reduce the dimension of the data set, and to achieve differential privacy protection with Laplace noise. First, we propose a marginal table cover algorithm based on frequent items by considering the effectiveness of query cover combination, and then obtain a regular marginal table cover set with smaller size but higher data availability. Then, a differential privacy model with irregular marginal table is proposed in the application scenario with low data availability and high cover rate. Next, we obtain the approximate optimal marginal table cover algorithm by our analysis to get the query cover set which satisfies the multi-level query policy constraint. Thus, the balance between privacy protection and data availability is achieved. Finally, extensive experiments have been done on synthetic and real databases, demonstrating that the proposed method preforms better than state-of-the-art methods in most cases.

Download Full-text

A Dataset for Comparing Mirrored and Non-Mirrored Male Bust Images for Facial Recognition

Data ◽

10.3390/data4010026 ◽

2019 ◽

Vol 4 (1) ◽

pp. 26 ◽

Cited By ~ 1

Author(s):

Collin Gros ◽

Jeremy Straub

Keyword(s):

Data Collection ◽

Facial Recognition ◽

Data Sets ◽

Similar Data ◽

Data Set ◽

Human Recognition ◽

Related Research

Facial recognition, as well as other types of human recognition, have found uses in identification, security, and learning about behavior, among other uses. Because of the high cost of data collection for training purposes, logistical challenges and other impediments, mirroring images has frequently been used to increase the size of data sets. However, while these larger data sets have shown to be beneficial, their comparative level of benefit to the data collection of similar data has not been assessed. This paper presented a data set collected and prepared for this and related research purposes. The data set included both non-occluded and occluded data for mirroring assessment.

Download Full-text

High-contrast Imaging with Fizeau Interferometry: the Case of Altair*

The Astronomical Journal ◽

10.3847/1538-3881/ac3b5b ◽

2022 ◽

Vol 163 (2) ◽

pp. 62

Author(s):

E. Spalding ◽

K. M. Morzinski ◽

P. Hinz ◽

J. Males ◽

M. Meyer ◽

...

Keyword(s):

Adaptive Optics ◽

Active Phase ◽

Phase Detector ◽

Integration Time ◽

Data Sets ◽

Contrast Imaging ◽

Science Data ◽

Data Set ◽

Nearby Star ◽

High Contrast Imaging

Abstract The Large Binocular Telescope (LBT) has two 8.4 m primary mirrors that produce beams that can be combined coherently in a “Fizeau” interferometric mode. In principle, the Fizeau point-spread function (PSF) enables the probing of structure at a resolution up to three times better than that of the adaptive-optics-corrected PSF of a single 8.4 m telescope. In this work, we examined the nearby star Altair (5.13 pc, type A7V, hundreds of Myr to ≈1.4 Gyr) in the Fizeau mode with the LBT at Brα (4.05 μm) and carried out angular differential imaging to search for companions. This work presents the first filled-aperture LBT Fizeau science data set to benefit from a correcting mirror that provides active phase control. In the analysis of the λ/D angular regime, the sensitivity of the data set is down to ≈0.5 M ⊙ at 1″ for a 1.0 Gyr system. This sensitivity remains limited by the small amount of integration time, which is in turn limited by the instability of the Fizeau PSF. However, in the Fizeau fringe regime we attain sensitivities of Δm ≈ 5 at 0.″2 and put constraints on companions of 1.3 M ⊙ down to an inner angle of ≈0.″15, closer than any previously published direct imaging of Altair. This analysis is a pathfinder for future data sets of this type, and represents some of the first steps to unlocking the potential of the first Extremely Large Telescope. Fizeau observations will be able to reach dimmer targets with upgrades to the instrument, in particular the phase detector.

Download Full-text

A multi-decade record of high-quality fCO<sub>2</sub> data in version 3 of the Surface Ocean CO<sub>2</sub> Atlas (SOCAT)

A multi-decade record of high-quality <i>f</i>CO<sub>2</sub> data in version 3 of the Surface Ocean CO<sub>2</sub> Atlas (SOCAT)

Getting Started Creating Data Dictionaries: How to Create a Shareable Data Set

From pole to pole: 33 years of physical oceanography onboard R/V <i>Polarstern</i>

Alternative Clustering

Open-Source Data Collection and Data Sets for Activity Recognition in Smart Homes

Limitations on digital filtering of the DNAG magnetic data set for the conterminous U.S.

A uniform, quality controlled Surface Ocean CO<sub>2</sub> Atlas (SOCAT)

Research on improved privacy publishing algorithm based on set cover

A Dataset for Comparing Mirrored and Non-Mirrored Male Bust Images for Facial Recognition

High-contrast Imaging with Fizeau Interferometry: the Case of Altair*

Export Citation Format