XML schema validation using parsing expression grammars

10.7287/peerj.preprints.1503v1 ◽

2015 ◽

Author(s):

Kimio Kuramitsu ◽

Shin'ya Yamaguchi

Keyword(s):

Information Exchange ◽

Linear Time ◽

Xml Schema ◽

Reliable Information ◽

Document Type ◽

Xml Dtd ◽

Validation Tool ◽

The Web

Schema validation is an integral part of reliable information exchange on the Web. However, implementing an efficient schema validation tool is not easy. We highlight the use of parsing expression grammars (PEGs), a recognition-based foundation for describing syntax, and apply it to the XML/DTD validation. This paper shows that structural schema constraints in document type definitions (DTDs) can be validated by the converted PEGs with the linear time and constant space consumption. We study the performance of several existing PEG-based tools, and then confirm that the converted PEGs achieve a practical and even competitive level of performance under existing standard XML/DTD validators.

Download Full-text

Validation of schema mappings with nested queries

Computer Science and Information Systems ◽

10.2298/csis120713042r ◽

2013 ◽

Vol 10 (1) ◽

pp. 79-104

Author(s):

Guillem Rull ◽

Carles Farré ◽

Ernest Teniente ◽

Toni Urpí

Keyword(s):

Integrity Constraints ◽

Automatic Process ◽

Document Type ◽

Relational Schemas ◽

Schema Mappings ◽

Nested Data ◽

Nested Queries ◽

The Given ◽

Human Designer ◽

The Web

With the emergence of the Web and the wide use of XML for representing data, the ability to map not only flat relational but also nested data has become crucial. The design of schema mappings is a semi-automatic process. A human designer is needed to guide the process, choose among mapping candidates, and successively refine the mapping. The designer needs a way to figure out whether the mapping is what was intended. Our approach to mapping validation allows the designer to check whether the mapping satisfies certain desirable properties. In this paper, we focus on the validation of mappings between nested relational schemas, in which the mapping assertions are either inclusions or equalities of nested queries. We focus on the nested relational setting since most XML?s Document Type Definitions (DTDs) can be represented in this model. We perform the validation by reasoning on the schemas and mapping definition. We take into account the integrity constraints defined on both the source and target schema. We consider constraints and mapping?s queries which may contain arithmetic comparisons and negations. This class of mapping scenarios is significantly more expressive than the ones addressed by previous work on nested relational mapping validation. We encode the given mapping scenario into a single flat database schema, so we can take advantage of our previous work on validating flat relational mappings, and reformulate each desirable property check as a query satisfiability problem.

Download Full-text

Textuality on the Web

Advances in Human and Social Aspects of Technology - Innovative Methods and Technologies for Electronic Discourse Analysis ◽

10.4018/978-1-4666-4426-7.ch019 ◽

2013 ◽

pp. 414-436 ◽

Cited By ~ 1

Author(s):

Chiara Degano

Keyword(s):

Meaning Making ◽

Linear Time ◽

Computer Mediated Communication ◽

Mediated Communication ◽

Computer Mediated ◽

Text Types ◽

The Web

This chapter focuses on computer mediated communication from a linguistic perspective, exploring aspects of textuality which have been impacted by the pervasive spread of the hypertext. Central features in the construction of texts are the notions of cohesion and coherence, originally tailored on linear time-based modes of communication, where both the elements and their sequentiality – fully controlled by the author – contribute to meaning making. In light of the disruption of linear sequentiality brought by the space-based logic of the hypertext, this chapter aims to understand how cohesion and coherence work in the website environment, with specific regard to genres characterised by an argumentative drive, which potentially suffer more than other text types from the loss of the author’s control on the linear dispositio of arguments. The analysis identifies different patterns for the construction of cohesion and coherence in argumentative websites, which accommodate traditional standards of textuality into the new environment.

Download Full-text

Extending XML Types Using Updates

Services and Business Computing Solutions with XML ◽

10.4018/978-1-60566-330-2.ch001 ◽

2010 ◽

pp. 1-21 ◽

Cited By ~ 2

Author(s):

Béatrice Bouchou ◽

Denio Duarte ◽

Mírian Halfeld Ferrari ◽

Martin A. Musicante

Keyword(s):

Web Service ◽

Xml Schema ◽

Experimental Results ◽

Element Content ◽

Network Connection ◽

Service Evolution ◽

Xml Document ◽

Communication Needs ◽

Service Protocol ◽

The Web

The XML Messaging Protocol, a part of the Web service protocol stack, is responsible for encoding messages in a common XML format (or type), so that they can be understood at either end of a network connection. The evolution of an XML type may be required in order to reflect new communication needs, materialized by slightly different XML messages. For instance, due to a service evolution, it might be interesting to extend a type in order to allow the reception of more information, when it is available, instead of always disregarding it. The authors’ proposal consists in a conservative XML schema evolution. The framework is as follows: administrators enter updates performed on a valid XML document in order to specify new documents expected to be valid, and the system computes new types accepting both such documents and previously valid ones. Changing the type is mainly changing regular expressions that define element content models. They present the algorithm that implements this approach, its properties and experimental results.

Download Full-text

SEMANTICALLY INTEROPERABLE XML DATA

International Journal of Semantic Computing ◽

10.1142/s1793351x13500037 ◽

2013 ◽

Vol 07 (03) ◽

pp. 237-255 ◽

Cited By ~ 2

Author(s):

CRISTOBAL VERGARA-NIEDERMAYR ◽

FUSHENG WANG ◽

TONY PAN ◽

TAHSIN KURC ◽

JOEL SALTZ

Keyword(s):

Information Exchange ◽

Medical Image ◽

Xml Schema ◽

Data Sources ◽

Grid Infrastructure ◽

Common Data Elements ◽

Xml Data ◽

Web Based ◽

Semantic Annotations ◽

Data Elements

XML is ubiquitously used as an information exchange platform for web-based applications in healthcare, life sciences, and many other domains. Proliferating XML data are now managed through latest native XML database technologies. XML data sources conforming to common XML schemas could be shared and integrated with syntactic interoperability. Semantic interoperability can be achieved through semantic annotations of data models using common data elements linked to concepts from ontologies. In this paper, we present a framework and software system to support the development of semantic interoperable XML based data sources that can be shared through a Grid infrastructure. We also present our work on supporting semantic validated XML data through semantic annotations for XML Schema, semantic validation and semantic authoring of XML data. We demonstrate the use of the system for a biomedical database of medical image annotations and markups.

Download Full-text

DocBook

Proceedings of the Symposium on Markup Vocabulary Ecosystems ◽

10.4242/balisagevol22.walsh01 ◽

2018 ◽

Author(s):

Norman Walsh

Keyword(s):

General Purpose ◽

Xml Schema ◽

Document Type Definition ◽

Free Software ◽

Computer Hardware ◽

General Notion ◽

Document Type ◽

Active Maintenance ◽

Type Definition

DocBook is a general purpose XML schema particularly well suited to books and papers about computer hardware and software (though it is by no means limited to these applications). DocBook has been under active maintenance for more than 20 years,; it began life as an SGML document type definition. Because it is a large and robust schema, and because its main structures correspond to the general notion of what constitutes a “book,” DocBook has been adopted by a large and growing community of authors writing books of all kinds. DocBook is supported “out of the box” by a number of commercial tools, and there is rapidly expanding support for it in a number of free software environments. These features have combined to make DocBook a generally easy to understand, widely useful, and very popular schema. Dozens of organizations are using DocBook for millions of pages of documentation, in various print and online formats, worldwide.

Download Full-text

Finding Reliable Information on the Web Should and Can Still Be Improved

Journal of Computing and Information Technology ◽

10.20532/cit.2018.1004240 ◽

2018 ◽

Vol 26 (1) ◽

pp. 1-6 ◽

Cited By ~ 1

Author(s):

Mathias Glatz ◽

Hermann Maurer ◽

Muhammad Tanvir Azfal

Keyword(s):

Reliable Information ◽

The Web

Download Full-text

Virtual Communities and the Alignment of Web Ontologies

Encyclopedia of Virtual Communities and Technologies ◽

10.4018/978-1-59140-563-4.ch098 ◽

2011 ◽

pp. 497-499 ◽

Cited By ~ 2

Author(s):

Krzysztof Juszczyszyn

Keyword(s):

Semantic Web ◽

Information Exchange ◽

Virtual Communities ◽

Software Agents ◽

Dynamic Environment ◽

Human Consumption ◽

Web Content ◽

Description Framework ◽

Many Sources ◽

The Web

The World Wide Web (WWW) is a global, ubiquitous, and fundamentally dynamic environment for information exchange and processing. By connecting vast numbers of individuals, the Web enables creation of virtual communities, and during the last 10 years, became a universal collaboration infrastructure. The so-called Semantic Web, a concept proposed by Tim Berners-Lee, is a new WWW architecture that enhances content with formal semantics (Berners-Lee, Hendler, & Lassila, 2001). Hence, the Web content is made suitable for machine processing (i.e., it is described by the associated metadata), as opposed to HTML documents available only for human consumption. Languages such as Resource Description Framework (RDF) and Ontology Web Language (OWL) along with well-known XML are used for description of Web resources. In other words, the Semantic Web is a vision of the future Web in which information is given explicit meaning. This will enable autonomous software agents to reason about Web content and produce intelligent responses to events (Staab, 2002). The ultimate goal of the next generation’s Web is to support the creation of virtual communities which will be composed of software agents and humans cooperating within the same environment. Sharing knowledge within such a community requires a shared conceptual vocabularies—ontologies, which represent the formal common agreement about the meaning of data (Gomez-Perez & Corcho, 2002). Artificial intelligence defines ontologies as explicit, formal specification of a shared conceptualization (Studer, Benjamins, & Fensel, 1998). In this case, a conceptualization stands for an abstract model of some concept from the real world; explicit means that the type of concept used is explicitly defined. Formal refers to the fact that an ontology should be machine readable; and finally shared means that ontology expresses knowledge that is accepted by all the subjects. In short, an ontology defines the terms used to describe and represent an area of knowledge. However, the shared ontologies must be first constructed by using information from many sources which may be of arbitrary quality. Thus, it is necessary to find a way to seamlessly combine the knowledge from many sources, maybe diverse and heterogeneous. The resultant ontologies enable virtual communities and teams to manage and exchange their knowledge. It should be noted, that the word ontology has been used to describe notions with different degrees of structure—from taxonomies (e.g., Yahoo hierarchy), metadata schemes (e.g., Dublin Core), to logical theories. The Semantic Web needs ontologies with a significant degree of structure. These should allow the specification of at least the following kinds of things: • Concepts (which identify the classes of things like cars or birds) from many domains of interest • The relationships that can exist among concepts • The properties (or attributes) those concepts may have

Download Full-text

AN AVERAGE LINEAR TIME ALGORITHM FOR WEB USAGE MINING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622004001021 ◽

2004 ◽

Vol 03 (02) ◽

pp. 307-319 ◽

Cited By ~ 11

Author(s):

JOSÉ BORGES ◽

MARK LEVENE

Keyword(s):

Markov Chain ◽

High Probability ◽

Linear Time ◽

Time Algorithm ◽

Web Pages ◽

Data Mining Algorithm ◽

Web Navigation ◽

Navigation Data ◽

The Web ◽

Average Behaviour

In this paper, we study the complexity of a data mining algorithm for extracting patterns from user web navigation data that was proposed in previous work.3 The user web navigation sessions are inferred from log data and modeled as a Markov chain. The chain's higher probability trails correspond to the preferred trails on the web site. The algorithm implements a depth-first search that scans the Markov chain for the high probability trails. We show that the average behaviour of the algorithm is linear time in the number of web pages accessed.

Download Full-text

A framework for medical visual information exchange on the WEB

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2004.10.004 ◽

2006 ◽

Vol 36 (4) ◽

pp. 327-338 ◽

Cited By ~ 8

Author(s):

Silvio Antonio Carro ◽

Jacob Scharcanski

Keyword(s):

Information Exchange ◽

Visual Information ◽

The Web

Download Full-text