Proceedings of the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth
Latest Publications


TOTAL DOCUMENTS

6
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By Mulberry Technologies, Inc.

098243443x, 9780982434437

Author(s):  
Michael Leventhal ◽  
Eric Lemoine

The XML chip is now more than six years old. The diffusion of this technology has been very limited, due, on the one hand, to the long period of evolutionary development needed to develop hardware capable of accelerating a significant portion of the XML computing workload and, on the other hand, to the fact that the chip was invented by start-up Tarari in a commercial context which required, for business reasons, a minimum of public disclosure of its design features. It remains, nevertheless, a significant landmark that the XML chip has been sold and continuously improved for the last six years. From the perspective of general computing history, the XML chip is an uncommon example of a successful workload-specific symbolic computing device. With respect to the specific interests of the XML community, the XML chip is a remarkable validation of one of its core founding principles: normalizing on a data format, whatever its imperfections, would enable the developers to, eventually, create tools to process it efficiently. This paper was prepared for the International Symposium on Processing XML Efficiently: Overcoming Limits on Space, Time, or Bandwidth, a day of discussion among, predominately, software developers working in the area of efficient XML processing. The Symposium is being held as a workshop within Balisage, a conference of specialists in markup theory. Given the interests of the audience this paper does not delve into the design features and principles of the chip itself; rather it presents a dialectic on the motivation for the development of an XML chip in view of related and potentially competing developments in scaling as it is commonly characterized as a manifestation of Moore's Law, parallelization through increasing the number of computing cores on general purpose processors (multicore Von Neumann architecture), and optimization of software.


Author(s):  
James A. Robinson

HighWire Press is the online publishing operation of the Stanford University Libraries, and currently hosts online journals for over 140 separate publishers. HighWire has developed and deployed a new XML-based publishing platform, codenamed H2O, and is in the process of migrating all of its publishers to this new platform. This paper describes four XML-based systems developed for our new H2O platform, and describes some of the performance characteristics of each. We describe some limitations encountered with these systems, and conclude with thoughts about our experience migrating to an XML-based platform.


Author(s):  
Mohamed Zergaoui

Although the ideal approach to streaming is to process markup events as soon as they are encountered, with no memory needing to be used for storing parts of the input document, this is not always feasible, and in practice it is useful to consider “near-streaming” approaches that involve a limited amount of buffering or lookahead. In the extreme, however, such approaches degenerate until they are indistinguishable from non-streaming processes. This paper attempts a classification of streaming and near-streaming processing methods using different approaches to memory management, and discusses the advantages and disadvantages of each.


Author(s):  
Richard Salz ◽  
Heather Achilles ◽  
David Maze

This presentation will discuss some of the hardware and software trade-offs in the IBM DataPower XML processor, known as the XG4. The XG4 is a PCI card that parses XML, and supports XPath, schema validation, and has a generic post-processing engine. It can return events like SAX, build a tree like a DOM, or switch between modes within a document. It is capable of supporting thousands of sessions simultaneously, and because of its pipeline nature can process more than one character per clock tick. The talk will explain some of the features in the card and its device driver, such as memory usage and zero-copy, synchronization of QName identifiers between card and software, and the programmability.


Author(s):  
Rob Cameron ◽  
Ken Herdy ◽  
Ehsan Amiri

By first transforming the octets (bytes) of XML texts into eight parallel bit streams, the SIMD features of commodity processors can be exploited for parallel processing of blocks of 128 input bytes at a time. Established transcoding and parsing techniques are reviewed followed by new techniques including parsing with bitstream addition. Further opportunities are discussed in light of expected advances in CPU architecture and compiler technology. Implications for various APIs and information models are presented as well opportunities for collaborative open-source development.


Author(s):  
David A. Lee ◽  
Norman Walsh

The efficiency and performance of individual XML operations such as parsing, processing (XSLT, XQuery) and serialization, and the merits of different in-memory document representations, have been widely discussed. However, real world uses cases often involve many operations orchestrated using a scripting environment. The performance of the scripting environment can often overshadow any performance gains in individual operations. In an exploration of real world scripting, we compare performance of several scripting languages and techniques on a set of typical XML operations such as generation of a table of contents and conditionally accessing non-XML files identified in XML documents. Based on performance results, we suggest best practices for scripting XML processes. Scripting languages compared include DOS Shell (CMD.EXE), Linux Shell (bash), XMLSH, and XProc (calabash). These are run (where possible) on multiple operating systems: Windows XP, Linux, and Mac/OS.


Sign in / Sign up

Export Citation Format

Share Document