scholarly journals Knowledge discovery in large model datasets in the marine environment: the THREDDS Data Server example

2012 ◽  
Vol 3 (1) ◽  
pp. 41 ◽  
Author(s):  
A. Bergamasco ◽  
A. Benetazzo ◽  
S. Carniel ◽  
F.M. Falcieri ◽  
T. Minuzzo ◽  
...  

In order to monitor, describe and understand the marine environment, many research institutions are involved in the acquisition and distribution of ocean data, both from observations and models. Scientists from these institutions are spending too much time looking for, accessing, and reformatting data: they need better tools and procedures to make the science they do more efficient. The U.S. Integrated Ocean Observing System (US-IOOS) is working on making large amounts of distributed data usable in an easy and efficient way. It is essentially a network of scientists, technicians and technologies designed to acquire, collect and disseminate observational and modelled data resulting from coastal and oceanic marine regions investigations to researchers, stakeholders and policy makers. In order to be successful, this effort requires standard data protocols, web services and standards-based tools. Starting from the US-IOOS approach, which is being adopted throughout much of the oceanographic and meteorological sectors, we describe here the CNR-ISMAR Venice experience in the direction of setting up a national Italian IOOS framework using the THREDDS (THematic Real-time Environmental Distributed Data Services) Data Server (TDS), a middleware designed to fill the gap between data providers and data users. The TDS provides services that allow data users to find the data sets pertaining to their scientific needs, to access, to visualize and to use them in an easy way, without downloading files to the local workspace. In order to achieve this, it is necessary that the data providers make their data available in a standard form that the TDS understands, and with sufficient metadata to allow the data to be read and searched in a standard way. The core idea is then to utilize a Common Data Model (CDM), a unified conceptual model that describes different datatypes within each dataset. More specifically, Unidata (<a href="http://www.unidata.ucar.edu" target="_blank">www.unidata.ucar.edu</a>) has developed CDM specifications for many of the different kinds of data used by the scientific community, such as grids, profiles, time series, swath data. These datatypes are aligned the NetCDF Climate and Forecast (CF) Metadata Conventions and with Climate Science Modelling Language (CSML); CF-compliant NetCDF files and GRIB files can be read directly with no modification, while non compliant files can be modified to meet appropriate metadata requirements. Once standardized in the CDM, the TDS makes datasets available through a series of web services such as OPeNDAP or Open Geospatial Consortium Web Coverage Service (WCS), allowing the data users to easily obtain small subsets from large datasets, and to quickly visualize their content by using tools such as GODIVA2 or Integrated Data Viewer (IDV). In addition, an ISO metadata service is available through the TDS that can be harvested by catalogue broker services (e.g. GI-cat) to enable distributed search across federated data servers. Example of TDS datasets can be accessed at the CNR-ISMAR Venice site <a href="http://tds.ve.ismar.cnr.it:8080/thredds/catalog.html" target="_blank">http://tds.ve.ismar.cnr.it:8080/thredds/catalog.html</a>.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yance Feng ◽  
Lei M. Li

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2737
Author(s):  
Leandro Ordonez-Ante ◽  
Gregory Van Seghbroeck ◽  
Tim Wauters ◽  
Bruno Volckaert ◽  
Filip De Turck

Citizen engagement is one of the key factors for smart city initiatives to remain sustainable over time. This in turn entails providing citizens and other relevant stakeholders with the latest data and tools that enable them to derive insights that add value to their day-to-day life. The massive volume of data being constantly produced in these smart city environments makes satisfying this requirement particularly challenging. This paper introduces Explora, a generic framework for serving interactive low-latency requests, typical of visual exploratory applications on spatiotemporal data, which leverages the stream processing for deriving—on ingestion time—synopsis data structures that concisely capture the spatial and temporal trends and dynamics of the sensed variables and serve as compacted data sets to provide fast (approximate) answers to visual queries on smart city data. The experimental evaluation conducted on proof-of-concept implementations of Explora, based on traditional database and distributed data processing setups, accounts for a decrease of up to 2 orders of magnitude in query latency compared to queries running on the base raw data at the expense of less than 10% query accuracy and 30% data footprint. The implementation of the framework on real smart city data along with the obtained experimental results prove the feasibility of the proposed approach.


Author(s):  
Dr. Manish L Jivtode

Web services are applications that allow for communication between devices over the internet and are independent of the technology. The devices are built and use standardized eXtensible Markup Language (XML) for information exchange. A client or user is able to invoke a web service by sending an XML message and then gets back and XML response message. There are a number of communication protocols for web services that use the XML format such as Web Services Flow Language (WSFL), Blocks Extensible Exchange Protocol(BEEP) etc. Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) are used options for accessing web services. It is not directly comparable that SOAP is a communications protocol while REST is a set of architectural principles for data transmission. In this paper, the data size of 1KB, 2KB, 4KB, 8KB and 16KB were tested each for Audio, Video and result obtained for CRUD methods. The encryption and decryption timings in milliseconds/seconds were recorded by programming extensibility points of a WCF REST web service in the Azure cloud..


2021 ◽  
Author(s):  
Jan Michalek ◽  
Kuvvet Atakan ◽  
Christian Rønnevik ◽  
Helga Indrøy ◽  
Lars Ottemøller ◽  
...  

&lt;p&gt;The European Plate Observing System (EPOS) is a European project about building a pan-European infrastructure for accessing solid Earth science data, governed now by EPOS ERIC (European Research Infrastructure Consortium). The EPOS-Norway project (EPOS-N; RCN-Infrastructure Programme - Project no. 245763) is a Norwegian project funded by National Research Council. The aim of the Norwegian EPOS e&amp;#8209;infrastructure is to integrate data from the seismological and geodetic networks, as well as the data from the geological and geophysical data repositories. Among the six EPOS-N project partners, four institutions are providing data &amp;#8211; University of Bergen (UIB), - Norwegian Mapping Authority (NMA), Geological Survey of Norway (NGU) and NORSAR.&lt;/p&gt;&lt;p&gt;In this contribution, we present the EPOS-Norway Portal as an online, open access, interactive tool, allowing visual analysis of multidimensional data. It supports maps and 2D plots with linked visualizations. Currently access is provided to more than 300 datasets (18 web services, 288 map layers and 14 static datasets) from four subdomains of Earth science in Norway. New datasets are planned to be integrated in the future. EPOS-N Portal can access remote datasets via web services like FDSNWS for seismological data and OGC services for geological and geophysical data (e.g. WMS). Standalone datasets are available through preloaded data files. Users can also simply add another WMS server or upload their own dataset for visualization and comparison with other datasets. This portal provides unique way (first of its kind in Norway) for exploration of various geoscientific datasets in one common interface. One of the key aspects is quick simultaneous visual inspection of data from various disciplines and test of scientific or geohazard related hypothesis. One of such examples can be spatio-temporal correlation of earthquakes (1980 until now) with existing critical infrastructures (e.g. pipelines), geological structures, submarine landslides or unstable slopes. &amp;#160;&lt;/p&gt;&lt;p&gt;The EPOS-N Portal is implemented by adapting Enlighten-web, a server-client program developed by NORCE. Enlighten-web facilitates interactive visual analysis of large multidimensional data sets, and supports interactive mapping of millions of points. The Enlighten-web client runs inside a web browser. An important element in the Enlighten-web functionality is brushing and linking, which is useful for exploring complex data sets to discover correlations and interesting properties hidden in the data. The views are linked to each other, so that highlighting a subset in one view automatically leads to the corresponding subsets being highlighted in all other linked views.&lt;/p&gt;


2021 ◽  
Author(s):  
Benjamin Moreno-Torres ◽  
Christoph Völker ◽  
Sabine Kruschwitz

&lt;div&gt; &lt;p&gt;Non-destructive testing (NDT) data in civil engineering is regularly used for scientific analysis. However, there is no uniform representation of the data yet. An analysis of distributed data sets across different test objects is therefore too difficult in most cases.&lt;/p&gt; &lt;p&gt;To overcome this, we present an approach for an integrated data management of distributed data sets based on Semantic Web technologies. The cornerstone of this approach is an ontology, a semantic knowledge representation of our domain. This NDT-CE ontology is later populated with the data sources. Using the properties and the relationships between concepts that the ontology contains, we make these data sets meaningful also for machines. Furthermore, the ontology can be used as a central interface for database access. Non-domain data sources can be integrated by linking them with the NDT ontology, making them directly available for generic use in terms of digitization. Based on an extensive literature research, we outline the possibilities that result for NDT in civil engineering, such as computer-aided sorting and analysis of measurement data, and the recognition and explanation of correlations.&lt;/p&gt; &lt;p&gt;A common knowledge representation and data access allows the scientific exploitation of existing data sources with data-based methods (such as image recognition, measurement uncertainty calculations, factor analysis or material characterization) and simplifies bidirectional knowledge and data transfer between engineers and NDT specialists.&lt;/p&gt; &lt;/div&gt;


Author(s):  
Ondrej Habala ◽  
Martin Šeleng ◽  
Viet Tran ◽  
Branislav Šimo ◽  
Ladislav Hluchý

The project Advanced Data Mining and Integration Research for Europe (ADMIRE) is designing new methods and tools for comfortable mining and integration of large, distributed data sets. One of the prospective application domains for such methods and tools is the environmental applications domain, which often uses various data sets from different vendors where data mining is becoming increasingly popular and more computer power becomes available. The authors present a set of experimental environmental scenarios, and the application of ADMIRE technology in these scenarios. The scenarios try to predict meteorological and hydrological phenomena which currently cannot or are not predicted by using data mining of distributed data sets from several providers in Slovakia. The scenarios have been designed by environmental experts and apart from being used as the testing grounds for the ADMIRE technology; results are of particular interest to experts who have designed them.


Author(s):  
Amir Basirat ◽  
Asad I. Khan ◽  
Heinz W. Schmidt

One of the main challenges for large-scale computer clouds dealing with massive real-time data is in coping with the rate at which unprocessed data is being accumulated. Transforming big data into valuable information requires a fundamental re-think of the way in which future data management models will need to be developed on the Internet. Unlike the existing relational schemes, pattern-matching approaches can analyze data in similar ways to which our brain links information. Such interactions when implemented in voluminous data clouds can assist in finding overarching relations in complex and highly distributed data sets. In this chapter, a different perspective of data recognition is considered. Rather than looking at conventional approaches, such as statistical computations and deterministic learning schemes, this chapter focuses on distributed processing approach for scalable data recognition and processing.


Sign in / Sign up

Export Citation Format

Share Document