Big Data Discovery | ScienceGate

The aim of using atypicality is to extract small, rare, unusual and interesting pieces out of big data. This complements statistics about typical data to give insight into data. In order to find such “interesting” parts of data, universal approaches are required, since it is not known in advance what we are looking for. We therefore base the atypicality criterion on codelength. In a prior paper we developed the methodology for discrete-valued data, and the current paper extends this to real-valued data. This is done by using minimum description length (MDL). We develop the information-theoretic methodology for a number of “universal” signal processing models, and finally apply them to recorded hydrophone data and heart rate variability (HRV) signal.

Download Full-text

Achieving secure big data collection based on trust evaluation and true data discovery

Computers & Security ◽

10.1016/j.cose.2020.101937 ◽

2020 ◽

Vol 96 ◽

pp. 101937

Author(s):

Denglong Lv ◽

Shibing Zhu

Keyword(s):

Big Data ◽

Data Collection ◽

Trust Evaluation ◽

Data Discovery

Download Full-text

Linked Thesauri Quality Assessment and Documentation for Big Data Discovery

2017 International Conference on High Performance Computing & Simulation (HPCS) ◽

10.1109/hpcs.2017.16 ◽

2017 ◽

Author(s):

Riccardo Albertoni ◽

Monica De Martino ◽

Alfonso Quarati

Keyword(s):

Big Data ◽

Quality Assessment ◽

Data Discovery

Download Full-text

Explorative Analyses of Nursing Research Data

Western Journal of Nursing Research ◽

10.1177/0193945916673815 ◽

2016 ◽

Vol 39 (1) ◽

pp. 5-19 ◽

Cited By ~ 2

Author(s):

Hyeoneui Kim ◽

Imho Jang ◽

Jimmy Quach ◽

Alex Richardson ◽

Jaemin Kim ◽

...

Keyword(s):

Big Data ◽

Data Science ◽

Research Data ◽

Nursing Research ◽

Data Discovery ◽

Data Standardization ◽

Healthcare Data ◽

Unique Data ◽

Integrative Analyses

As a first step of pursuing the vision of “Big Data science in nursing,” we described the characteristics of nursing research data reported in 194 published nursing studies. We also explored how completely the Version 1 metadata specification of biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) represents these metadata. The metadata items of the nursing studies were all related to one or more of the bioCADDIE metadata entities. However, values of many metadata items of the nursing studies were not sufficiently represented through the bioCADDIE metadata. This was partly due to the differences in the scope of the content that the bioCADDIE metadata are designed to represent. The 194 nursing studies reported a total of 1,181 unique data items, the majority of which take non-numeric values. This indicates the importance of data standardization to enable the integrative analyses of these data to support Big Data science in nursing.

Download Full-text

Semantic Data Discovery from Social Big Data

Social Big Data Analytics ◽

10.1007/978-981-33-6652-7_4 ◽

2021 ◽

pp. 89-112

Author(s):

Bilal Abu-Salih ◽

Pornpit Wongthongtham ◽

Dengya Zhu ◽

Kit Yan Chan ◽

Amit Rudra

Keyword(s):

Big Data ◽

Data Discovery ◽

Semantic Data ◽

Social Big Data

Download Full-text

Big Data, Bad Metadata: A Methodological Note on the Importance of Good Metadata in the Age of Digital History

Digital Histories: Emergent Approaches within the New Digital History ◽

10.33134/hup-5-6 ◽

2020 ◽

pp. 103-111

Author(s):

Kimmo Elo

Keyword(s):

Big Data ◽

Digital Humanities ◽

Great Majority ◽

Research Data ◽

Computer Algorithms ◽

Data Discovery ◽

Digital History ◽

Web Based ◽

Human Interface ◽

New Challenges

The chapter discusses how the exponential growth of both digitised and born-digital research materials has brought upon new challenges regarding research data creation, data management and data discovery. It argues that the Digital Humanities community should pay more attention to the metadata creation as valid, standardised and well-structured metadata describing similar contents in identical terms help scholars to better discover relevant materials. This is especially important since a great majority of digital sources are made available via web-based portals offering search engines or other possibilities to query the collections. This development from a human-to-human interface towards a human-to-computer interface replaces the ‘silent knowledge’ of archivists with computer algorithms. Since most algorithms rely on available metadata, the structural power of actors responsible for the metadata creation should be taken seriously. If scholars cannot rely on getting reliable results when committing searches in online collections, the digital leap manifested by proponents of digital humanities might end with a belly flop.

Download Full-text

DATS: the data tag suite to enable discoverability of datasets

10.1101/103143 ◽

2017 ◽

Author(s):

Susanna-Assunta Sansone ◽

Alejandra Gonzalez-Beltran ◽

Philippe Rocca-Serra ◽

George Alter ◽

Jeffrey S Grethe ◽

...

Keyword(s):

Big Data ◽

Scientific Literature ◽

Journal Article ◽

Life Sciences ◽

National Institutes Of Health ◽

Collaborative Effort ◽

Data Types ◽

Data Discovery ◽

Core Set ◽

Independent Model

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of datasets, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as a Schema.org annotated serialization to be used beyond DataMed, for example, in projects like DataCite.

Download Full-text

Data discovery with DATS: exemplar adoptions and lessons learned

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocx119 ◽

2017 ◽

Vol 25 (1) ◽

pp. 13-16 ◽

Cited By ~ 1

Author(s):

Alejandra N Gonzalez-Beltran ◽

John Campbell ◽

Patrick Dunn ◽

Diana Guijarro ◽

Sanda Ionescu ◽

...

Keyword(s):

Big Data ◽

Best Practices ◽

Lessons Learned ◽

National Institutes Of Health ◽

Data Sources ◽

Data Discovery ◽

Implementation Guidelines ◽

Diverse Data ◽

Existing Data ◽

The Web

Abstract The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a “PubMed for datasets.” The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS’s entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.

Download Full-text

Research Progress of University Psychology Based on Big Data------discovery, challenges and opportunities

Journal of Physics Conference Series ◽

10.1088/1742-6596/1437/1/012128 ◽

2020 ◽

Vol 1437 ◽

pp. 012128

Author(s):

Liu Yun

Keyword(s):

Big Data ◽

Research Progress ◽

Data Discovery ◽

Challenges And Opportunities

Download Full-text