Big Data Discovery

Big Data ◽  
2014 ◽  
pp. 198-219
Keyword(s):  
Big Data ◽  
Entropy ◽  
2019 ◽  
Vol 21 (3) ◽  
pp. 219 ◽  
Author(s):  
Elyas Sabeti ◽  
Anders Høst-Madsen

The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting” parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of “universal” signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal.


2016 ◽  
Vol 39 (1) ◽  
pp. 5-19 ◽  
Author(s):  
Hyeoneui Kim ◽  
Imho Jang ◽  
Jimmy Quach ◽  
Alex Richardson ◽  
Jaemin Kim ◽  
...  

As a first step of pursuing the vision of “Big Data science in nursing,” we described the characteristics of nursing research data reported in 194 published nursing studies. We also explored how completely the Version 1 metadata specification of biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) represents these metadata. The metadata items of the nursing studies were all related to one or more of the bioCADDIE metadata entities. However, values of many metadata items of the nursing studies were not sufficiently represented through the bioCADDIE metadata. This was partly due to the differences in the scope of the content that the bioCADDIE metadata are designed to represent. The 194 nursing studies reported a total of 1,181 unique data items, the majority of which take non-numeric values. This indicates the importance of data standardization to enable the integrative analyses of these data to support Big Data science in nursing.


2021 ◽  
pp. 89-112
Author(s):  
Bilal Abu-Salih ◽  
Pornpit Wongthongtham ◽  
Dengya Zhu ◽  
Kit Yan Chan ◽  
Amit Rudra

Author(s):  
Kimmo Elo

The chapter discusses how the exponential growth of both digitised and born-digital research materials has brought upon new challenges regarding research data creation, data management and data discovery. It argues that the Digital Humanities community should pay more attention to the metadata creation as valid, standardised and well-structured metadata describing similar contents in identical terms help scholars to better discover relevant materials. This is especially important since a great majority of digital sources are made available via web-based portals offering search engines or other possibilities to query the collections. This development from a human-to-human interface towards a human-to-computer interface replaces the ‘silent knowledge’ of archivists with computer algorithms. Since most algorithms rely on available metadata, the structural power of actors responsible for the metadata creation should be taken seriously. If scholars cannot rely on getting reliable results when committing searches in online collections, the digital leap manifested by proponents of digital humanities might end with a belly flop.


2017 ◽  
Author(s):  
Susanna-Assunta Sansone ◽  
Alejandra Gonzalez-Beltran ◽  
Philippe Rocca-Serra ◽  
George Alter ◽  
Jeffrey S Grethe ◽  
...  

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of datasets, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as a Schema.org annotated serialization to be used beyond DataMed, for example, in projects like DataCite.


2017 ◽  
Vol 25 (1) ◽  
pp. 13-16 ◽  
Author(s):  
Alejandra N Gonzalez-Beltran ◽  
John Campbell ◽  
Patrick Dunn ◽  
Diana Guijarro ◽  
Sanda Ionescu ◽  
...  

Abstract The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a “PubMed for datasets.” The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS’s entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.


Sign in / Sign up

Export Citation Format

Share Document