scholarly journals DATS: the data tag suite to enable discoverability of datasets

2017 ◽  
Author(s):  
Susanna-Assunta Sansone ◽  
Alejandra Gonzalez-Beltran ◽  
Philippe Rocca-Serra ◽  
George Alter ◽  
Jeffrey S Grethe ◽  
...  

Today's science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)'s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed's goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of datasets, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as a Schema.org annotated serialization to be used beyond DataMed, for example, in projects like DataCite.

2017 ◽  
Vol 25 (1) ◽  
pp. 13-16 ◽  
Author(s):  
Alejandra N Gonzalez-Beltran ◽  
John Campbell ◽  
Patrick Dunn ◽  
Diana Guijarro ◽  
Sanda Ionescu ◽  
...  

Abstract The DAta Tag Suite (DATS) is a model supporting dataset description, indexing, and discovery. It is available as an annotated serialization with schema.org, a vocabulary used by major search engines, thus making the datasets discoverable on the web. DATS underlies DataMed, the National Institutes of Health Big Data to Knowledge Data Discovery Index prototype, which aims to provide a “PubMed for datasets.” The experience gained while indexing a heterogeneous range of >60 repositories in DataMed helped in evaluating DATS’s entities, attributes, and scope. In this work, 3 additional exemplary and diverse data sources were mapped to DATS by their representatives or experts, offering a deep scan of DATS fitness against a new set of existing data. The procedure, including feedback from users and implementers, resulted in DATS implementation guidelines and best practices, and identification of a path for evolving and optimizing the model. Finally, the work exposed additional needs when defining datasets for indexing, especially in the context of clinical and observational information.


Author(s):  
Ying Wang ◽  
Yiding Liu ◽  
Minna Xia

Big data is featured by multiple sources and heterogeneity. Based on the big data platform of Hadoop and spark, a hybrid analysis on forest fire is built in this study. This platform combines the big data analysis and processing technology, and learns from the research results of different technical fields, such as forest fire monitoring. In this system, HDFS of Hadoop is used to store all kinds of data, spark module is used to provide various big data analysis methods, and visualization tools are used to realize the visualization of analysis results, such as Echarts, ArcGIS and unity3d. Finally, an experiment for forest fire point detection is designed so as to corroborate the feasibility and effectiveness, and provide some meaningful guidance for the follow-up research and the establishment of forest fire monitoring and visualized early warning big data platform. However, there are two shortcomings in this experiment: more data types should be selected. At the same time, if the original data can be converted to XML format, the compatibility is better. It is expected that the above problems can be solved in the follow-up research.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
I Mircheva ◽  
M Mirchev

Abstract Background Ownership of patient information in the context of Big Data is a relatively new problem, apparently not yet fully understood. There are not enough publications on the subject. Since the topic is interdisciplinary, incorporating legal, ethical, medical and aspects of information and communication technologies, a slightly more sophisticated analysis of the issue is needed. Aim To determine how the medical academic community perceives the issue of ownership of patient information in the context of Big Data. Methods Literature search for full text publications, indexed in PubMed, Springer, ScienceDirect and Scopus identified only 27 appropriate articles authored by academicians and corresponding to three focus areas: problem (ownership); area (healthcare); context (Big Data). Three major aspects were studied: scientific area of publications, aspects and academicians' perception of ownership in the context of Big Data. Results Publications are in the period 2014 - 2019, 37% published in health and medical informatics journals, 30% in medicine and public health, 19% in law and ethics; 78% authored by American and British academicians, highly cited. The majority (63%) are in the area of scientific research - clinical studies, access and use of patient data for medical research, secondary use of medical data, ethical challenges to Big data in healthcare. The majority (70%) of the publications discuss ownership in ethical and legal aspects and 67% see ownership as a challenge mostly to medical research, access control, ethics, politics and business. Conclusions Ownership of medical data is seen first and foremost as a challenge. Addressing this challenge requires the combined efforts of politicians, lawyers, ethicists, computer and medical professionals, as well as academicians, sharing these efforts, experiences and suggestions. However, this issue is neglected in the scientific literature. Publishing may help in open debates and adequate policy solutions. Key messages Ownership of patient information in the context of Big Data is a problem that should not be marginalized but needs a comprehensive attitude, consideration and combined efforts from all stakeholders. Overcoming the challenge of ownership may help in improving healthcare services, medical and public health research and the health of the population as a whole.


2020 ◽  
Vol 4 (2) ◽  
pp. 5 ◽  
Author(s):  
Ioannis C. Drivas ◽  
Damianos P. Sakas ◽  
Georgios A. Giannakopoulos ◽  
Daphne Kyriaki-Manessi

In the Big Data era, search engine optimization deals with the encapsulation of datasets that are related to website performance in terms of architecture, content curation, and user behavior, with the purpose to convert them into actionable insights and improve visibility and findability on the Web. In this respect, big data analytics expands the opportunities for developing new methodological frameworks that are composed of valid, reliable, and consistent analytics that are practically useful to develop well-informed strategies for organic traffic optimization. In this paper, a novel methodology is implemented in order to increase organic search engine visits based on the impact of multiple SEO factors. In order to achieve this purpose, the authors examined 171 cultural heritage websites and their retrieved data analytics about their performance and user experience inside them. Massive amounts of Web-based collections are included and presented by cultural heritage organizations through their websites. Subsequently, users interact with these collections, producing behavioral analytics in a variety of different data types that come from multiple devices, with high velocity, in large volumes. Nevertheless, prior research efforts indicate that these massive cultural collections are difficult to browse while expressing low visibility and findability in the semantic Web era. Against this backdrop, this paper proposes the computational development of a search engine optimization (SEO) strategy that utilizes the generated big cultural data analytics and improves the visibility of cultural heritage websites. One step further, the statistical results of the study are integrated into a predictive model that is composed of two stages. First, a fuzzy cognitive mapping process is generated as an aggregated macro-level descriptive model. Secondly, a micro-level data-driven agent-based model follows up. The purpose of the model is to predict the most effective combinations of factors that achieve enhanced visibility and organic traffic on cultural heritage organizations’ websites. To this end, the study contributes to the knowledge expansion of researchers and practitioners in the big cultural analytics sector with the purpose to implement potential strategies for greater visibility and findability of cultural collections on the Web.


2017 ◽  
Vol 113 (2) ◽  
pp. 1037-1057 ◽  
Author(s):  
Saeed-Ul Hassan ◽  
Mubashir Imran ◽  
Uzair Gillani ◽  
Naif Radi Aljohani ◽  
Timothy D. Bowman ◽  
...  

2021 ◽  
Vol 251 ◽  
pp. 01030
Author(s):  
Qinqi Kang ◽  
Zhao Kang

With the rapid development of artificial intelligence in the current era of big data, the construction of translation corpus has become a key factor in effectively achieving a highly intelligent translation. In the era of big data, the data sources and data types of translation corpus are becoming more and more diversified, which will inevitably bring about a new revolution in the construction of translation corpus. The construction of the translation corpus in the era of big data can fully rely on third-party open source data, crowd-sourcing translation, machine closed-loop, human-machine collaboration and other multiple modes to comprehensively improve the quality of translation corpus construction to better serve translation practice.


2015 ◽  
Vol 22 (6) ◽  
pp. 1115-1119 ◽  
Author(s):  
Saurabh Sinha ◽  
Jun Song ◽  
Richard Weinshilboum ◽  
Victor Jongeneel ◽  
Jiawei Han

Abstract We describe here the vision, motivations, and research plans of the National Institutes of Health Center for Excellence in Big Data Computing at the University of Illinois, Urbana-Champaign. The Center is organized around the construction of “Knowledge Engine for Genomics” (KnowEnG), an E-science framework for genomics where biomedical scientists will have access to powerful methods of data mining, network mining, and machine learning to extract knowledge out of genomics data. The scientist will come to KnowEnG with their own data sets in the form of spreadsheets and ask KnowEnG to analyze those data sets in the light of a massive knowledge base of community data sets called the “Knowledge Network” that will be at the heart of the system. The Center is undertaking discovery projects aimed at testing the utility of KnowEnG for transforming big data to knowledge. These projects span a broad range of biological enquiry, from pharmacogenomics (in collaboration with Mayo Clinic) to transcriptomics of human behavior.


Author(s):  
Nada M. Alhakkak

BigGIS is a new product that resulted from developing GIS in the “Big Data” area, which is used in storing and processing big geographical data and helps in solving its issues. This chapter describes an optimized Big GIS framework in Map Reduce Environment M2BG. The suggested framework has been integrated into Map Reduce Environment in order to solve the storage issues and get the benefit of the Hadoop environment. M2BG include two steps: Big GIS warehouse and Big GIS Map Reduce. The first step contains three main layers: Data Source and Storage Layer (DSSL), Data Processing Layer (DPL), and Data Analysis Layer (DAL). The second layer is responsible for clustering using swarms as inputs for the Hadoop phase. Then it is scheduled in the mapping part with the use of a preempted priority scheduling algorithm; some data types are classified as critical and some others are ordinary data type; the reduce part used, merge sort algorithm M2BG, should solve security and be implemented with real data in the simulated environment and later in the real world.


Sign in / Sign up

Export Citation Format

Share Document