scholarly journals Wikidata: A platform for data integration and dissemination for the life sciences and beyond

2015 ◽  
Author(s):  
Elvira Mitraka ◽  
Andra Waagmeester ◽  
Sebastian Burgstaller-Muehlbacher ◽  
Lynn M Schriml ◽  
Andrew I Su ◽  
...  

Wikidata is an open, Semantic Web-compatible database that anyone can edit. This ′data commons′ provides structured data for Wikipedia articles and other applications. Every article on Wikipedia has a hyperlink to an editable item in this database. This unique connection to the world′s largest community of volunteer knowledge editors could help make Wikidata a key hub within the greater Semantic Web. The life sciences, as ever, faces crucial challenges in disseminating and integrating knowledge. Our group is addressing these issues by populating Wikidata with the seeds of a foundational semantic network linking genes, drugs and diseases. Using this content, we are enhancing Wikipedia articles to both increase their quality and recruit human editors to expand and improve the underlying data. We encourage the community to join us as we collaboratively create what can become the most used and most central semantic data resource for the life sciences and beyond.

Web Services ◽  
2019 ◽  
pp. 1812-1835
Author(s):  
Saravjeet Singh ◽  
Jaiteg Singh

Management of data for an organization is crucial task but when data goes to its complex form then it becomes multifaceted as well as vital. In today era most of the organizations generating semi structured or unstructured data that requires special techniques to handle and manage. With the needs to handle unstructured data, semantic web technology provides a way to come up with the effective solution. In this chapter Synthetic Semantic Data Management (SSDM) is explained that is based semantic web technique and will helps to manage data of small and Midsized Enterprise (SME). SSDM provide the procedure to handle, store, manages and retrieval of semi structured data.


Author(s):  
Saravjeet Singh ◽  
Jaiteg Singh

Management of data for an organization is crucial task but when data goes to its complex form then it becomes multifaceted as well as vital. In today era most of the organizations generating semi structured or unstructured data that requires special techniques to handle and manage. With the needs to handle unstructured data, semantic web technology provides a way to come up with the effective solution. In this chapter Synthetic Semantic Data Management (SSDM) is explained that is based semantic web technique and will helps to manage data of small and Midsized Enterprise (SME). SSDM provide the procedure to handle, store, manages and retrieval of semi structured data.


Author(s):  
Andra Waagmeester ◽  
Lynn Schriml ◽  
Andrew Su

Wikidata (http://www.wikidata.org) is the linked database of the Wikimedia Foundation. Like its sister project Wikipedia it is open to humans and machines. Initially primarily intended as a central repository of structured data for the approximately 200 language versions of Wikipedia, Wikidata currently also serves many other use cases. It is an open, Semantic Web-compatible database that anyone can edit. Here, we present the Gene Wiki initiative. In 2008, this project started by creating Wikipedia articles for all human genes (Huss et al. 2008). These articles were enriched with structured information on these genes as tables (called infoboxes). With the onset of Wikidata in 2012, the project diverted its attention from the infoboxes and since we have been enriching Wikidata with structured knowledge from public scientific resources on gene, proteins, diseases and compounds (Burgstaller-Muehlbacher et al. 2016). This structured information is added to Wikidata, while active links to the primary source are maintained. Adding a new resource to Wikidata is a community-driven process that starts with modelling the subjects of the resource under scrutiny. This involves seeking commonalities with similar concepts in Wikidata and, if none are found, new are created. This process mostly happens in a collaboratively-edited document (i.e. GDocs), where different graphical networks are drawn to reflect the data being modelled and its embedding in Wikidata. Once consensus has been reached, the model typically exists in a human-readable document. To allow future validations of these models on existing data, it is converted in a machine-readable Shape Expression (ShEx) (Anonymous 2019, Waagmeester et al. 2017). Shape Expressions schema language can be consumed and produced by humans and machines and is useful in model development, legacy review or as formal documentation. Once a semantic data model (as Shape Expression) is found, i.e. community consensus is reached, a bot is developed to convert the knowledge from the primary source, into the Wikidata model. While Wikidata is linked data (part of the semantic web), many life science resources are not. On the contrary, many distinct file formats or API output formats are used to present life-science knowledge. To convert between these different formats, bots need to be developed that are able to parse the different resources and serialize into wikidata. We have developed a software library in the Python programming language, which we use to build these bots. Once created, these bots run regularly to keep Wikidata up-to-date with knowledge on genes, proteins, diseases and drugs. Having scientific knowledge represented in Wikidata comes with benefits. First, having research data on Wikidata increases its sustainability. When research projects end, their findings now remain on an independently funded infrastructure. Having someone else dealing with an infrastructure for a data commons also relieves the research community of having to do it themselves, leading to more time to focus on doing research As a generic public data commons, Wikidata allows public scrutiny and rapid integration with other domains. Inconsistencies or disagreement between resources become more visible, due to the unified data models and interfaces. The latter we leverage as a feature in our bots. One of our core resources is, for example, the Disease Ontology (Schriml et al. 2018). This ontology on human diseases is continuously updated by its curation team. 2 times per month, updates are then synchronised with Wikidata. If inconsistencies and disagreement with other resources surface, they are logged and shared with the curation team of the Disease Ontology. Hence, we have created a bi-directional update cycle, improving both the Disease Ontology and Wikidata. Although our bots focus on molecular biology, our approaches are generic in onset that we are confident a similar approach can work in biodiversity informatics.


Author(s):  
Saravjeet Singh ◽  
Jaiteg Singh

Management of data for an organization is crucial task but when data goes to its complex form then it becomes multifaceted as well as vital. In today era most of the organizations generating semi structured or unstructured data that requires special techniques to handle and manage. With the needs to handle unstructured data, semantic web technology provides a way to come up with the effective solution. In this chapter Synthetic Semantic Data Management (SSDM) is explained that is based semantic web technique and will helps to manage data of small and Midsized Enterprise (SME). SSDM provide the procedure to handle, store, manages and retrieval of semi structured data.


Author(s):  
Justin E. H. Smith

Though it did not yet exist as a discrete field of scientific inquiry, biology was at the heart of many of the most important debates in seventeenth-century philosophy. Nowhere is this more apparent than in the work of G. W. Leibniz. This book offers the first in-depth examination of Leibniz's deep and complex engagement with the empirical life sciences of his day, in areas as diverse as medicine, physiology, taxonomy, generation theory, and paleontology. The book shows how these wide-ranging pursuits were not only central to Leibniz's philosophical interests, but often provided the insights that led to some of his best-known philosophical doctrines. Presenting the clearest picture yet of the scope of Leibniz's theoretical interest in the life sciences, the book takes seriously the philosopher's own repeated claims that the world must be understood in fundamentally biological terms. Here it reveals a thinker who was immersed in the sciences of life, and looked to the living world for answers to vexing metaphysical problems. The book casts Leibniz's philosophy in an entirely new light, demonstrating how it radically departed from the prevailing models of mechanical philosophy and had an enduring influence on the history and development of the life sciences. Along the way, the book provides a fascinating glimpse into early modern debates about the nature and origins of organic life, and into how philosophers such as Leibniz engaged with the scientific dilemmas of their era.


2006 ◽  
Author(s):  
Michael Schroeder ◽  
Eric Neumann
Keyword(s):  

Author(s):  
Uwe Weissflog

Abstract This paper provides an overview of methods and ideas to achieve data integration in CIM. It describes a dictionary approach allowing participating applications to define their common constructs gradually as an additional service across application systems. Because of the importance of product definition data, the role of PDES/STEP as part of this dictionary approach is also described. The technical concepts of the dictionary, such as schema mapping, semantic data model, user methods and the required additions within participating applications are explained. Problems related to data integrity, data redundancy, performance and binding of dissimilar software components are discussed as well as the deficiencies related to today’s data modelling capabilities. The added value an active dictionary can provide to a CIM environment consisting of established applications in heterogeneous environments, where migration into one standardized homogeneous set of CIM applications is not likely, is also explained.


Sign in / Sign up

Export Citation Format

Share Document