LODsyndesis: Global Scale Knowledge Services

Michalis Mountantonakis; Yannis Tzitzikas

doi:10.3390/heritage1020023

LODsyndesis: Global Scale Knowledge Services

Heritage ◽

10.3390/heritage1020023 ◽

2018 ◽

Vol 1 (2) ◽

pp. 335-348 ◽

Cited By ~ 4

Author(s):

Michalis Mountantonakis ◽

Yannis Tzitzikas

Keyword(s):

Machine Learning ◽

Reference Model ◽

Data Cleaning ◽

Open Data ◽

British Museum ◽

Global Scale ◽

Learning Tasks ◽

Museum Data ◽

Cultural Domain ◽

Cidoc Crm

In this paper, we present LODsyndesis, a suite of services over the datasets of the entire Linked Open Data Cloud, which offers fast, content-based dataset discovery and object co-reference. Emphasis is given on supporting scalable cross-dataset reasoning for finding all information about any entity and its provenance. Other tasks that can be benefited from these services are those related to the quality and veracity of data since the collection of all information about an entity, and the cross-dataset inference that is feasible, allows spotting the contradictions that exist, and also provides information for data cleaning or for estimating and suggesting which data are probably correct or more accurate. In addition, we will show how these services can assist the enrichment of existing datasets with more features for obtaining better predictions in machine learning tasks. Finally, we report measurements that reveal the sparsity of the current datasets, as regards their connectivity, which in turn justifies the need for advancing the current methods for data integration. Measurements focusing on the cultural domain are also included, specifically measurements over datasets using CIDOC CRM (Conceptual Reference Model), and connectivity measurements of British Museum data. The services of LODsyndesis are based on special indexes and algorithms and allow the indexing of 2 billion triples in around 80 min using a cluster of 96 computers.

Download Full-text

The Archaeological Excavation Report of Rigny: An Example of an Interoperable Logicist Publication

Heritage ◽

10.3390/heritage2010049 ◽

2019 ◽

Vol 2 (1) ◽

pp. 761-773 ◽

Cited By ~ 1

Author(s):

Olivier Marlet ◽

Elisabeth Zadora-Rio ◽

Pierre-Yves Buard ◽

Béatrice Markhoff ◽

Xavier Rodier

Keyword(s):

Reference Model ◽

Cognitive Structure ◽

Open Data ◽

International Committee ◽

Alternative Form ◽

Graphical Interface ◽

Archaeological Excavation ◽

Cidoc Crm ◽

Data Inference ◽

Different Levels

The logicist program, which was initiated in the 1970s by J.C. Gardin, aims to clarify the reasoning processes in the field of archaeology and to explore new forms of publication, in order to overcome the growing imbalance between the flood of publications and our capacities of assimilation. The logicist program brings out the cognitive structure of archaeological constructs, which establishes a bridge between empirical facts or descriptive propositions, at one end of the argumentation, and interpretative propositions at the other. This alternative form of publication is designed to highlight the chain of inference and the evidence on which it stands. In the case of the logicist publication of the archaeological excavation in Rigny (Indre-et-Loire, France), our workflow can provide different levels of access to the content, allowing both speed-reading and in-depth consultation. Both the chains of inference and the ArSol database containing the field records that provide evidence for the initial propositions are visualized in a diagram structure. We rely on the International Committee for Documentation Conceptual Reference Model (CIDOC CRM) entities for ensuring the semantic interoperability of such publications within the Linked Open Data. Inference chains are mapped to CRMinf and ArSol records are mapped to CRM, CRMSci and CRMArcheo. Moreover, as part of the work carried out by the French Huma-Num MASA Consortium, a project is underway to allow the building of logicist publications starting from a graphical interface for describing the structure and content of propositions.

Download Full-text

Waste Not, Want Not: Assessing the Environmental Sustainability of the University of Houston's Digital Preservation Program

Archiving Conference ◽

10.2352/issn.2168-3204.2020.1.0.49 ◽

2020 ◽

Vol 2020 (1) ◽

pp. 49-54

Author(s):

Bethany Scott ◽

Diana Dulek

Keyword(s):

Machine Learning ◽

Environmental Sustainability ◽

Open Data ◽

Physical Structure ◽

Specific Content ◽

Strong Trend ◽

Metadata Standards ◽

Meta Information ◽

Cidoc Crm ◽

The University

Digitization projects of analog photographic collections are still growing in number, and therefore such assets of images become bigger continuously. Also, there is a strong trend towards open data and interfaces to access and reuse the image resources (FAIR data). To be able to search and find images in a repository, metadata of a certain depth must be existing. Typically, indexing and valorization, done by experts that know the (photographic) collections, is necessary to achieve such meta-information. There are various metadata standards based on different concepts for the description of collections. Some, like ISAD(G), are more related to the physical structure of archives, others, like CIDOC-CRM, take into account the content of the images in detail. Enhancing the depth of indexing increases the time necessary drastically. It is also a task that is not easily scalable because specific content related knowledge is necessary. With the assistance of artificial intelligence, historic photographic collections could potentially be enhanced with metadata semi-automatically. For the successful application of machine learning, it is essential to have robust training sets. In the presented paper, we show our observations in monitoring participants indexing historic collections of photographs. In the observations of workshops of people working with photographic heritage, it was monitored how single photographs but also image groups are described. Based on that knowledge, machine learning components can be trained and optimized for that particular type of source material. The demonstrated approach has the potential to support the work of valorization substantially. In addition, the approach has, to some extent, the potential to preserve the fundamental structures of knowledge of contemporary witnesses.

Download Full-text

Heritage Connector: A Machine Learning Framework for Building Linked Open Data from Museum Collections

Applied AI Letters ◽

10.1002/ail2.23 ◽

2021 ◽

Author(s):

Kalyan Dutia ◽

John Stack

Keyword(s):

Machine Learning ◽

Open Data ◽

Linked Open Data ◽

Museum Collections ◽

Learning Framework

Download Full-text

Modeling fine-scale residential land price distribution: An experimental study using open data and machine learning

Applied Geography ◽

10.1016/j.apgeog.2021.102442 ◽

2021 ◽

Vol 129 ◽

pp. 102442

Author(s):

Peng Zhang ◽

Shougeng Hu ◽

Weidong Li ◽

Chuanrong Zhang ◽

Shengfu Yang ◽

...

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Open Data ◽

Fine Scale ◽

Land Price ◽

Residential Land ◽

Price Distribution

Download Full-text

Data science skills: Building partnership for efficient school curriculum delivery in Africa

Statistical Journal of the IAOS ◽

10.3233/sji-200693 ◽

2020 ◽

Vol 36 ◽

pp. 49-62

Author(s):

Nureni Olawale Adeboye ◽

Peter Osuolale Popoola ◽

Oluwatobi Nurudeen Ogunnusi

Keyword(s):

Machine Learning ◽

Data Science ◽

Open Data ◽

School Curriculum ◽

Global Perspective ◽

Special Focus ◽

African Countries ◽

Tertiary Institutions ◽

Data Description ◽

Curriculum Delivery

Data science is a concept to unify statistics, data analysis, machine learning and their related methods in order to analyze actual phenomena with data to provide better understanding. This article focused its investigation on acquisition of data science skills in building partnership for efficient school curriculum delivery in Africa, especially in the area of teaching statistics courses at the beginners’ level in tertiary institutions. Illustrations were made using Big data of selected 18 African countries sourced from United Nations Educational, Scientific and Cultural Organization (UNESCO) with special focus on some macro-economic variables that drives economic policy. Data description techniques were adopted in the analysis of the sourced open data with the aid of R analytics software for data science, as improvement on the traditional methods of data description for learning and thus open a new charter of education curriculum delivery in African schools. Though, the collaboration is not without its own challenges, its prospects in creating self-driven learning culture among students of tertiary institutions has greatly enhanced the quality of teaching, advancing students skills in machine learning, improved understanding of the role of data in global perspective and being able to critique claims based on data.

Download Full-text

TPOT-NN: augmenting tree-based automated machine learning with neural network estimators

Genetic Programming and Evolvable Machines ◽

10.1007/s10710-021-09401-z ◽

2021 ◽

Author(s):

Joseph D. Romano ◽

Trang T. Le ◽

Weixuan Fu ◽

Jason H. Moore

Keyword(s):

Neural Network ◽

Machine Learning ◽

Binary Classification ◽

Inductive Learning ◽

Future Directions ◽

High Performing ◽

Learning Tasks ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Standard Tree

AbstractAutomated machine learning (AutoML) and artificial neural networks (ANNs) have revolutionized the field of artificial intelligence by yielding incredibly high-performing models to solve a myriad of inductive learning tasks. In spite of their successes, little guidance exists on when to use one versus the other. Furthermore, relatively few tools exist that allow the integration of both AutoML and ANNs in the same analysis to yield results combining both of their strengths. Here, we present TPOT-NN—a new extension to the tree-based AutoML software TPOT—and use it to explore the behavior of automated machine learning augmented with neural network estimators (AutoML+NN), particularly when compared to non-NN AutoML in the context of simple binary classification on a number of public benchmark datasets. Our observations suggest that TPOT-NN is an effective tool that achieves greater classification accuracy than standard tree-based AutoML on some datasets, with no loss in accuracy on others. We also provide preliminary guidelines for performing AutoML+NN analyses, and recommend possible future directions for AutoML+NN methods research, especially in the context of TPOT.

Download Full-text

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Download Full-text

Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGboost algorithms according to different high dimensional data cleaning

Electronic Commerce Research and Applications ◽

10.1016/j.elerap.2018.08.002 ◽

2018 ◽

Vol 31 ◽

pp. 24-39 ◽

Cited By ~ 48

Author(s):

Xiaojun Ma ◽

Jinglan Sha ◽

Dehua Wang ◽

Yuanbo Yu ◽

Qian Yang ◽

...

Keyword(s):

Machine Learning ◽

Data Cleaning ◽

High Dimensional Data ◽

P2p Network ◽

High Dimensional ◽

Loan Default

Download Full-text

A content spectral-based text representation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219248 ◽

2021 ◽

pp. 1-12

Author(s):

Melesio Crespo-Sanchez ◽

Ivan Lopez-Arevalo ◽

Edwin Aldana-Bobadilla ◽

Alejandro Molina-Villegas

Keyword(s):

Machine Learning ◽

Text Analysis ◽

Question Answering ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Text Representation ◽

Feature Vectors ◽

Learning Tasks ◽

Semantic Component ◽

Vector Representations

In the last few years, text analysis has grown as a keystone in several domains for solving many real-world problems, such as machine translation, spam detection, and question answering, to mention a few. Many of these tasks can be approached by means of machine learning algorithms. Most of these algorithms take as input a transformation of the text in the form of feature vectors containing an abstraction of the content. Most of recent vector representations focus on the semantic component of text, however, we consider that also taking into account the lexical and syntactic components the abstraction of content could be beneficial for learning tasks. In this work, we propose a content spectral-based text representation applicable to machine learning algorithms for text analysis. This representation integrates the spectra from the lexical, syntactic, and semantic components of text producing an abstract image, which can also be treated by both, text and image learning algorithms. These components came from feature vectors of text. For demonstrating the goodness of our proposal, this was tested on text classification and complexity reading score prediction tasks obtaining promising results.

Download Full-text