Proceedings of the International Symposium on Quality Assurance and Quality Control in XML
Latest Publications


TOTAL DOCUMENTS

8
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By Mulberry Technologies, Inc.

9781935958055

Author(s):  
Eric van der Vlist

Ever modified an XML schema? Ever broken something while fixing a bug or adding a new feature? As with any piece of engineering, the more complex a schema is, the harder it is to maintain. In other domains, unit tests dramatically reduce the number of regressions and thus provide a kind of safety net for maintainers. Can we learn from these techniques and adapt them to XML schema languages? In this workshop session, we develop a schema using unit test techniques, to illustrate their benefits in this domain.


Author(s):  
Tamara Stoker ◽  
Keith Rose

The benefits of using XML in publishing are widely known but those benefits are more difficult to attain if the quality of the XML produced by the process is not consistently at a very high level. This case study outlines the steps that the American Chemical Society (“ACS”) has taken both in-house and in collaboration with the vendor to which we have outsourced portions of our publication workflow. In addition to producing predictable XML, these efforts have also improved our publication time.


Author(s):  
Sheila M. Morrissey ◽  
John Meyer ◽  
Sushil Bhattarai ◽  
Gautham Kalwala ◽  
Sachin Kurdikar ◽  
...  

One of the consequences of the rapid development and dissemination of the ecosystem of XML technologies was the widespread adoption of XML as a meta-format for the specification of application configuration information. The validation of these rich configuration files with standard XML validation tools, however, is often not sufficient for error-free deployment of applications. This paper considers how to categorize some of the constraints that cannot be enforced by such tools, and discusses some XML-based approaches to enforcing such constraints before, or as part of, deployment.


Author(s):  
Charlie Halpern-Hamu

A wide variety of techniques have been used in an XML data conversion project. Emphasis on Quality Assurance, not making errors in the first place, was supported by Quality Control, catching errors that occurred anyway. Data analysis and estimation techniques included counting function points in source documents to estimate effort and autogeneration of tight schemas to discover variation. Quality assurance was based on guiding specification based on parent-child pairs and programming for context and all content. Quality Control techniques included source-to-target comparison to check for lost or duplicated content, automatic highlighting of anomalous data, and use of XQuery to review data.


Author(s):  
Christopher Kelly ◽  
Jeff Beck

PubMed Central (PMC) is the US National Library of Medicine's digital archive of life sciences journal literature. On average the PMC team processes 14,000 articles per month. All of the articles are submitted to the archive by publishers in an agreed-upon markup format, and the articles are transformed to a common article model. To help reduce the number of content- or markup-related problems encountered, PMC puts new participating journals through an evaluation period that includes both automated and manual checks. Once a journal has passed the evaluation stage, it sends its content to PMC on a regular production schedule. The production content is processed automatically when it arrives, with any processing problems generating expected error messages. Although most of the content sent to PMC has been through production systems and QA when it was published, we've found that there is still a level of manual content checking that needs to be done on the production content. Any problems found must be investigated to determine if they result from a problem in the source content, a problem with the PMC ingest transforms, or simply a problem with our rendering of the normalized XML content. Having the content in XML certainly has advantages: it can be validated against a schema and easily manipulated and processed, but XML doesn't solve all of the problems. A sharp eye and attention to detail are still needed by the production team as they would be for any publishing process.


Author(s):  
Steven J. DeRose

Text analytics involves extracting features of meaning from natural language texts and making them explicit, much as markup does. It uses linguistics, AI, and statistical methods to get at a level of "meaning" that markup generally does not: down in the leaves of what to XML may be unanalyzed "content". This suggests potential for new kinds of error, consistency, and quality checking. However, text analytics can also discover features that markup is used for; this suggests that text analytics can also contribute to the markup process itself. Perhaps the simplest example of text analytics' potential for checking, is xml:lang. Language identification is well-developed technology, and xml:lang attributes "in the wild" could be much improved. More interestingly, the distribution of named entities (people, places, organizations, etc.), topics, and emphasis interacts closely with documents' markup structures. Summaries, abstracts, conclusions, and the like all have distinctive features which can be measured. This paper provides an overview of how text analytics works, what it can do, and how that relates to the things we typically mark up in XML. It also discuss the trade-offs and decisions involved in just what we choose to mark up, and how that interacts with automation. It presents several specific ways that text analytics can help create, check, and enhance XML components, and exemplifies some cases using a high-volume analytics tool.


Author(s):  
Wei Zhao ◽  
Jayanthy Chengan ◽  
Agnes Bai

Ontario Scholars Portal (SP) is an XML based digital repository containing over 31,000,000 articles from more than 13,000 full text journals of 24 publishers which covers every academic discipline. Starting in 2006, SP began adopting NLM Journal Archiving and Interchange Tag Set v2.3 for its XML based E-journals system using MarkLogic. The publishers' native data is transformed to NLM Tag Set in SP in order to normalize data elements to a single standard for archiving, display and searching. Scholars Portal has established extremely high standards for ensuring that the content loaded into Scholars Portal is accurate and complete. Through the entire workflow from data ingest , data conversion and data display, quality control procedures have been implemented to ensure the integrity of the digital repository.


Author(s):  
Dale Waldt

Validation of XML documents typically provides feedback in binary, yes/no form. This avoids the ambiguity, manual intervention, and increased cost of other approaches. But it may not be enough to make XML applications efficient, accurate, or semantically rich. How do you ensure that the correct element and attribute types are applied to the appropriate content chunks? That XML documents are accurate and current? That your XML has a level of semantic richness appropriate to your business goals? How do you control quality over large collections? How do you resolve conflicting organizational goals for information integration and ensure that content and schemas help the enterprise as a whole? Conceptual and physical models, model / schema traceability, and effective stakeholder review can all help. Schematron, document comparison (diff) tools, statistical methods can also help, but may raise QA questions of their own. Improvements in requirements gathering and QA processes can produce visible results; concrete examples can and will be discussed.


Sign in / Sign up

Export Citation Format

Share Document