Proceedings of the International Symposium on Quality Assurance and Quality Control in XML

XML instances to validate XML schemas

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.vlist02 ◽

2012 ◽

Cited By ~ 1

Author(s):

Eric van der Vlist

Keyword(s):

Safety Net ◽

Xml Schema ◽

Workshop Session ◽

Unit Test ◽

New Feature ◽

Schema Languages ◽

Unit Tests

Ever modified an XML schema? Ever broken something while fixing a bug or adding a new feature? As with any piece of engineering, the more complex a schema is, the harder it is to maintain. In other domains, unit tests dramatically reduce the number of regressions and thus provide a kind of safety net for maintainers. Can we learn from these techniques and adapt them to XML schema languages? In this workshop session, we develop a schema using unit test techniques, to illustrate their benefits in this domain.

Download Full-text

ACS Publications — Ensuring XML Quality

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.rose01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Tamara Stoker ◽

Keith Rose

Keyword(s):

American Chemical Society ◽

Chemical Society ◽

Publication Time ◽

High Level ◽

Publication Workflow ◽

Very High

The benefits of using XML in publishing are widely known but those benefits are more difficult to attain if the quality of the XML produced by the process is not consistently at a very high level. This case study outlines the steps that the American Chemical Society (“ACS”) has taken both in-house and in collaboration with the vendor to which we have outsourced portions of our publication workflow. In addition to producing predictable XML, these efforts have also improved our publication time.

Download Full-text

Beyond Well-Formed and Valid

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.morrissey01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Sheila M. Morrissey ◽

John Meyer ◽

Sushil Bhattarai ◽

Gautham Kalwala ◽

Sachin Kurdikar ◽

...

Keyword(s):

Rapid Development ◽

Widespread Adoption ◽

Xml Technologies ◽

Configuration Information

One of the consequences of the rapid development and dissemination of the ecosystem of XML technologies was the widespread adoption of XML as a meta-format for the specification of application configuration information. The validation of these rich configuration files with standard XML validation tools, however, is often not sufficient for error-free deployment of applications. This paper considers how to categorize some of the constraints that cannot be enforced by such tools, and discusses some XML-based approaches to enforcing such constraints before, or as part of, deployment.

Download Full-text

Case study: Quality assurance and quality control techniques in an XML data conversion project

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.halpern-hamu01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Charlie Halpern-Hamu

Keyword(s):

Quality Control ◽

Quality Assurance ◽

Counting Function ◽

Data Conversion ◽

Xml Data ◽

Content Quality ◽

Control Techniques ◽

Function Points ◽

Anomalous Data ◽

Quality Control Techniques

A wide variety of techniques have been used in an XML data conversion project. Emphasis on Quality Assurance, not making errors in the first place, was supported by Quality Control, catching errors that occurred anyway. Data analysis and estimation techniques included counting function points in source documents to estimate effort and autogeneration of tight schemas to discover variation. Quality assurance was based on guiding specification based on parent-child pairs and programming for context and all content. Quality Control techniques included source-to-target comparison to check for lost or duplicated content, automatic highlighting of anomalous data, and use of XQuery to review data.

Download Full-text

Quality Control of PMC Content: A Case Study

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.beck01 ◽

2012 ◽

Cited By ~ 2

Author(s):

Christopher Kelly ◽

Jeff Beck

Keyword(s):

Production Systems ◽

Team Processes ◽

Pubmed Central ◽

Publishing Process ◽

Journal Literature ◽

Regular Production ◽

The Us ◽

Evaluation Stage ◽

Attention To Detail

PubMed Central (PMC) is the US National Library of Medicine's digital archive of life sciences journal literature. On average the PMC team processes 14,000 articles per month. All of the articles are submitted to the archive by publishers in an agreed-upon markup format, and the articles are transformed to a common article model. To help reduce the number of content- or markup-related problems encountered, PMC puts new participating journals through an evaluation period that includes both automated and manual checks. Once a journal has passed the evaluation stage, it sends its content to PMC on a regular production schedule. The production content is processed automatically when it arrives, with any processing problems generating expected error messages. Although most of the content sent to PMC has been through production systems and QA when it was published, we've found that there is still a level of manual content checking that needs to be done on the production content. Any problems found must be investigated to determine if they result from a problem in the source content, a problem with the PMC ingest transforms, or simply a problem with our rendering of the normalized XML content. Having the content in XML certainly has advantages: it can be validated against a schema and easily manipulated and processed, but XML doesn't solve all of the problems. A sharp eye and attention to detail are still needed by the production team as they would be for any publishing process.

Download Full-text

The structure of content

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.derose01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Steven J. DeRose

Keyword(s):

Natural Language ◽

Statistical Methods ◽

High Volume ◽

Language Identification ◽

Text Analytics ◽

Named Entities ◽

Distinctive Features ◽

Trade Offs ◽

In The Wild

Text analytics involves extracting features of meaning from natural language texts and making them explicit, much as markup does. It uses linguistics, AI, and statistical methods to get at a level of "meaning" that markup generally does not: down in the leaves of what to XML may be unanalyzed "content". This suggests potential for new kinds of error, consistency, and quality checking. However, text analytics can also discover features that markup is used for; this suggests that text analytics can also contribute to the markup process itself. Perhaps the simplest example of text analytics' potential for checking, is xml:lang. Language identification is well-developed technology, and xml:lang attributes "in the wild" could be much improved. More interestingly, the distribution of named entities (people, places, organizations, etc.), topics, and emphasis interacts closely with documents' markup structures. Summaries, abstracts, conclusions, and the like all have distinctive features which can be measured. This paper provides an overview of how text analytics works, what it can do, and how that relates to the things we typically mark up in XML. It also discuss the trade-offs and decisions involved in just what we choose to mark up, and how that interacts with automation. It presents several specific ways that text analytics can help create, check, and enhance XML components, and exemplifies some cases using a high-volume analytics tool.

Download Full-text

Quality Control Practice for Scholars Portal, an XML-based E-journals Repository

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.zhao01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Wei Zhao ◽

Jayanthy Chengan ◽

Agnes Bai

Keyword(s):

Quality Control ◽

Academic Discipline ◽

Data Conversion ◽

Digital Repository ◽

Control Practice ◽

Control Procedures ◽

Single Standard ◽

High Standards ◽

Display Quality ◽

Data Elements

Ontario Scholars Portal (SP) is an XML based digital repository containing over 31,000,000 articles from more than 13,000 full text journals of 24 publishers which covers every academic discipline. Starting in 2006, SP began adopting NLM Journal Archiving and Interchange Tag Set v2.3 for its XML based E-journals system using MarkLogic. The publishers' native data is transformed to NLM Tag Set in SP in order to normalize data elements to a single standard for archiving, display and searching. Scholars Portal has established extremely high standards for ensuring that the content loaded into Scholars Portal is accurate and complete. Through the entire workflow from data ingest , data conversion and data display, quality control procedures have been implemented to ensure the integrity of the digital repository.

Download Full-text

Quality assurance in the XML world: Beyond validation

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML ◽

10.4242/balisagevol9.waldt01 ◽

2012 ◽

Cited By ~ 1

Author(s):

Dale Waldt

Keyword(s):

Quality Assurance ◽

Statistical Methods ◽

Information Integration ◽

Physical Models ◽

Organizational Goals ◽

Semantic Richness ◽

Xml Documents ◽

Manual Intervention ◽

Business Goals ◽

Requirements Gathering

Validation of XML documents typically provides feedback in binary, yes/no form. This avoids the ambiguity, manual intervention, and increased cost of other approaches. But it may not be enough to make XML applications efficient, accurate, or semantically rich. How do you ensure that the correct element and attribute types are applied to the appropriate content chunks? That XML documents are accurate and current? That your XML has a level of semantic richness appropriate to your business goals? How do you control quality over large collections? How do you resolve conflicting organizational goals for information integration and ensure that content and schemas help the enterprise as a whole? Conceptual and physical models, model / schema traceability, and effective stakeholder review can all help. Schematron, document comparison (diff) tools, statistical methods can also help, but may raise QA questions of their own. Improvements in requirements gathering and QA processes can produce visible results; concrete examples can and will be discussed.

Download Full-text

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

XML instances to validate XML schemas

ACS Publications — Ensuring XML Quality

Beyond Well-Formed and Valid

Case study: Quality assurance and quality control techniques in an XML data conversion project

Quality Control of PMC Content: A Case Study

The structure of content

Quality Control Practice for Scholars Portal, an XML-based E-journals Repository

Quality assurance in the XML world: Beyond validation

Export Citation Format

Proceedings of the International Symposium on Quality Assurance and Quality Control in XMLLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mulberry Technologies, Inc.

XML instances to validate XML schemas

ACS Publications — Ensuring XML Quality

Beyond Well-Formed and Valid

Case study: Quality assurance and quality control techniques in an XML data conversion project

Quality Control of PMC Content: A Case Study

The structure of content

Quality Control Practice for Scholars Portal, an XML-based E-journals Repository

Quality assurance in the XML world: Beyond validation

Proceedings of the International Symposium on Quality Assurance and Quality Control in XML
Latest Publications