scholarly journals LODsyndesis: Global Scale Knowledge Services

Heritage ◽  
2018 ◽  
Vol 1 (2) ◽  
pp. 335-348 ◽  
Author(s):  
Michalis Mountantonakis ◽  
Yannis Tzitzikas

In this paper, we present LODsyndesis, a suite of services over the datasets of the entire Linked Open Data Cloud, which offers fast, content-based dataset discovery and object co-reference. Emphasis is given on supporting scalable cross-dataset reasoning for finding all information about any entity and its provenance. Other tasks that can be benefited from these services are those related to the quality and veracity of data since the collection of all information about an entity, and the cross-dataset inference that is feasible, allows spotting the contradictions that exist, and also provides information for data cleaning or for estimating and suggesting which data are probably correct or more accurate. In addition, we will show how these services can assist the enrichment of existing datasets with more features for obtaining better predictions in machine learning tasks. Finally, we report measurements that reveal the sparsity of the current datasets, as regards their connectivity, which in turn justifies the need for advancing the current methods for data integration. Measurements focusing on the cultural domain are also included, specifically measurements over datasets using CIDOC CRM (Conceptual Reference Model), and connectivity measurements of British Museum data. The services of LODsyndesis are based on special indexes and algorithms and allow the indexing of 2 billion triples in around 80 min using a cluster of 96 computers.

Heritage ◽  
2019 ◽  
Vol 2 (1) ◽  
pp. 761-773 ◽  
Author(s):  
Olivier Marlet ◽  
Elisabeth Zadora-Rio ◽  
Pierre-Yves Buard ◽  
Béatrice Markhoff ◽  
Xavier Rodier

The logicist program, which was initiated in the 1970s by J.C. Gardin, aims to clarify the reasoning processes in the field of archaeology and to explore new forms of publication, in order to overcome the growing imbalance between the flood of publications and our capacities of assimilation. The logicist program brings out the cognitive structure of archaeological constructs, which establishes a bridge between empirical facts or descriptive propositions, at one end of the argumentation, and interpretative propositions at the other. This alternative form of publication is designed to highlight the chain of inference and the evidence on which it stands. In the case of the logicist publication of the archaeological excavation in Rigny (Indre-et-Loire, France), our workflow can provide different levels of access to the content, allowing both speed-reading and in-depth consultation. Both the chains of inference and the ArSol database containing the field records that provide evidence for the initial propositions are visualized in a diagram structure. We rely on the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) entities for ensuring the semantic interoperability of such publications within the Linked Open Data. Inference chains are mapped to CRMinf and ArSol records are mapped to CRM, CRMSci and CRMArcheo. Moreover, as part of the work carried out by the French Huma-Num MASA Consortium, a project is underway to allow the building of logicist publications starting from a graphical interface for describing the structure and content of propositions.


2020 ◽  
Vol 2020 (1) ◽  
pp. 49-54
Author(s):  
Bethany Scott ◽  
Diana Dulek

Digitization projects of analog photographic collections are still growing in number, and therefore such assets of images become bigger continuously. Also, there is a strong trend towards open data and interfaces to access and reuse the image resources (FAIR data). To be able to search and find images in a repository, metadata of a certain depth must be existing. Typically, indexing and valorization, done by experts that know the (photographic) collections, is necessary to achieve such meta-information. There are various metadata standards based on different concepts for the description of collections. Some, like ISAD(G), are more related to the physical structure of archives, others, like CIDOC-CRM, take into account the content of the images in detail. Enhancing the depth of indexing increases the time necessary drastically. It is also a task that is not easily scalable because specific content related knowledge is necessary. With the assistance of artificial intelligence, historic photographic collections could potentially be enhanced with metadata semi-automatically. For the successful application of machine learning, it is essential to have robust training sets. In the presented paper, we show our observations in monitoring participants indexing historic collections of photographs. In the observations of workshops of people working with photographic heritage, it was monitored how single photographs but also image groups are described. Based on that knowledge, machine learning components can be trained and optimized for that particular type of source material. The demonstrated approach has the potential to support the work of valorization substantially. In addition, the approach has, to some extent, the potential to preserve the fundamental structures of knowledge of contemporary witnesses.


2021 ◽  
Vol 129 ◽  
pp. 102442
Author(s):  
Peng Zhang ◽  
Shougeng Hu ◽  
Weidong Li ◽  
Chuanrong Zhang ◽  
Shengfu Yang ◽  
...  

2020 ◽  
Vol 36 ◽  
pp. 49-62
Author(s):  
Nureni Olawale Adeboye ◽  
Peter Osuolale Popoola ◽  
Oluwatobi Nurudeen Ogunnusi

Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to analyze actual phenomena with data to provide better understanding. This article focused its investigation on acquisition of data science skills in building partnership for efficient school curriculum delivery in Africa, especially in the area of teaching statistics courses at the beginners’ level in tertiary institutions. Illustrations were made using Big data of selected 18 African countries sourced from United Nations Educational, Scientific and Cultural Organization (UNESCO) with special focus on some macro-economic variables that drives economic policy. Data description techniques were adopted in the analysis of the sourced open data with the aid of R analytics software for data science, as improvement on the traditional methods of data description for learning and thus open a new charter of education curriculum delivery in African schools. Though, the collaboration is not without its own challenges, its prospects in creating self-driven learning culture among students of tertiary institutions has greatly enhanced the quality of teaching, advancing students skills in machine learning, improved understanding of the role of data in global perspective and being able to critique claims based on data.


Author(s):  
Joseph D. Romano ◽  
Trang T. Le ◽  
Weixuan Fu ◽  
Jason H. Moore

AbstractAutomated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


2021 ◽  
pp. 1-12
Author(s):  
Melesio Crespo-Sanchez ◽  
Ivan Lopez-Arevalo ◽  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.


Sign in / Sign up

Export Citation Format

Share Document