scholarly journals Experimental Directory Structure (Exdir): An alternative to HDF5 without introducing a new file format

2018 ◽  
Author(s):  
Svenn-Arne Dragly ◽  
Milad Hobbi Mobarhan ◽  
Mikkel Lepperød ◽  
Simen Tennøe ◽  
Marianne Fyhn ◽  
...  

ABSTRACTNatural sciences generate an increasing amount of data in a wide range of formats developed by different research groups and commercial companies. At the same time there is a growing desire to share data along with publications in order to enable reproducible research. Open formats have publicly available specifications which facilitate data sharing and reproducible research. Hierarchical Data Format 5 (HDF5) is a popular open format widely used in neuroscience, often as a foundation for other, more specialized formats. However, drawbacks related to HDF5’s complex specification have initiated a discussion for an improved replacement. We propose a novel alternative, the Experimental Directory Structure (Exdir), an open standard for data storage in experimental pipelines which amends drawbacks associated with HDF5 while retaining its advantages. HDF5 stores data and metadata in a hierarchy within a complex binary file which, among other things, is not human-readable, not optimal for version control systems, and lacks support for storing raw data. Exdir, one the other hand, uses file system directories to represent the hierarchy, with metadata stored in human-readable YAML files, datasets stored in binary NumPy files, and raw data stored directly in subdirectories. Furthermore, storing data in multiple files makes it easier to track for version control systems. Exdir is not a file format in itself, but a standard for organizing files in a directory structure. Exdir uses the same abstractions as HDF5 and is compatible with the HDF5 Abstract Data Model. Several research groups are already using data stored in a directory hierarchy as an alternative to HDF5, but no common standard exists in the field. This complicates and limits the opportunity for data sharing and development of common tools for reading, writing, and analyzing data. Exdir facilitates improved data storage, data sharing, reproducible research, and novel insight from interdisciplinary collaboration. With the publication of Exdir, we invite the scientific community to join the development to create an open standard that will serve as many needs as possible and that will serve as a foundation for open access to and exchange of data.SIGNIFICANCE STATEMENTExperimental Directory Structure (Exdir) is a proposal to standardize a storage solution that has become an increasingly popular alternative to Hierarchical Data Format 5 (HDF5), namely to use directories to define a hierarchy, store data in binary files, and metadata in text files. While this strategy is deployed locally by several research groups, no common standard exists. We envision the establishment of such a standard and present Exdir to the community as a starting point.

Author(s):  
Sinta Berliana Sipayung ◽  
. Krismianto ◽  
. Risyanto

Terra and Aqua satellites that consist of multiple sensors including MODIS instruments, which is operated to detect the phenomena that exist on land, sea and atmosphere. Not a lot of data extracted especially for Indonesia region the associated with atmospheric data, because the product is still in the raw data (level-0). For data extraction of level-0 to level-2 needed software IMAPP (International MODIS/airs Processing Package) so displays some data atmospheric parameters including MOD 04 - Aerosol, MOD 05 - Total precipitable Water (Water Vapor), MOD 06 - Cloud, MOD 07 - Atmospheric Profiles, MOD 08 - gridded Atmospheric and MOD 35 in HDF4 format (Hierarchical Data Format-4) swath. This paper discussed only MOD07/MYD07 atmospheric profiles level-2 related parameters such as the temperature of the atmosphere at an altitude of 780 hPa and water vapor at a height of 700 hPa. This study aimed to analyze the phenomena in the atmosphere, based on extraction method Atmospheric Profiles in the resolution 1km,  that consists of temperature and moisture level-2, in the format hdf4 daily swath into data daily and monthly grid in .dat format, in the period of December 2014, January, July, and August 2015, especially in the area of Indonesia. The comparison between the results of the extraction swath and grid data from Terra/Aqua MODIS, that parameter atmospheric for the temperature has R-sqare an average of 0.72 and water vapor 0.74, while the RMSE temperature and water vapor are 0.88 and 0.29. Abstrak Satelit Terra dan Aqua yang terdiri dari beberapa sensor diantaranya instrumen MODIS, yang dioperasikan untuk mendeteksi fenomena yang ada di darat, laut, dan atmosfer. Belum banyak data yang diekstrak khususnya untuk wilayah Indonesia yang terkait dengan data atmosfer, karena produk MODIS masih berupa data mentah (level-0). Untuk ekstraksi data dari level-0 menjadi level-2 dibutuhkan software International MODIS/AIRS Processing Package (IMAPP) sehingga menampilkan beberapa data parameter atmosfer diantaranya MOD 04 - Aerosol, MOD 05 - Total Precipitable Water (Water Vapor), MOD 06 - Cloud, MOD 07 - Atmospheric Profiles, MOD 08 - Gridded Atmospheric dan MOD 35 swath dalam format Hierarchical Data Format-4 (HDF4). Pada makalah ini yang dibahas hanya MOD07/MYD07 atmospheric profiles level-2 yang berkaitan dengan parameter atmosfer seperti temperatur pada ketinggian 780 hPa dan uap air pada ketinggian 700 hPa. Penelitian ini bertujuan untuk menganalisis hasil ekstraksi data Atmospheric Profiles dari format HDF4 swath harian menjadi data grided harian, bulanan dalam format .dat serta aplikasinya pada periode bulan Desember 2014, Januari, Juli, dan Agustus 2015, khususnya wilayah Indonesia dalam resolusi 1km yang terdiri dari temperatur dan uap air level-2. Perbandingan antara hasil ekstraksi data MODIS swath dan data MODIS grided Terra/Aqua untuk parameter temperatur atmosfer mempunyai R-sqare rata-rata 0.72 dan uap air 0.74, sedangkan RMSE untuk temperatur dan uap air sebesar 0.88 dan 0.29.


Sign in / Sign up

Export Citation Format

Share Document