W3C

XML Processing Model Requirements and Use Cases

W3C Working Group Note 17 January 2006

This version:
http://www.w3.org/TR/2006/NOTE-xproc-requirements-20060117/
Latest version:
http://www.w3.org/TR/xproc-requirements/
Editor:
Alex Milowski, Invited Expert <alex@milowski.com>

This document is also available in these non-normative formats: XML.


Abstract

This document contains requirements for the development of XML Processing Model and Language, which are intended to describe and specify the processing relationships between XML resources.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a Working Group Note of the Requirements Document for an XML Processing Model and Language for describing an interoperable way for applications to describe the order in which processes should be applied to XML documents.

This document has been produced by the W3C XML Processing Model Group as part of the XML Activity and is an continuation of the work done by the XML Core Working Group. This document supersedes their requirement document note.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

As of this publication, the Working Group expects to eventually publish this document as a Working Group Note. It is not expected to become a W3C Recommendation, and therefore it has no associated W3C Patent Policy licensing obligations.

Table of Contents

1 Introduction
2 Design Principles
3 Terminology
4 Requirements
    4.1 Allow Control Over Inputs and Outputs Infosets of Steps and XML Process Models
    4.2 Minimal Input Processing Options
    4.3 XML Pipelining Support
    4.4 Input Collection Process Order
    4.5 Allow Optimization of Process Steps
    4.6 New Components and Steps
    4.7 Error Handling and Fall-back
    4.8 Conditional Processing
    4.9 Streaming XML Pipelines
    4.10 Multiple Input and Output Support
    4.11 Minimal Component Support
    4.12 Data Model Based
    4.13 An XML Language
    4.14 Declarative Components and Connections Between Steps
    4.15 Language Neutral Implementation
    4.16 Interactions Between Stages
    4.17 Pipeline Composition
    4.18 Pipeline Naming
    4.19 Same Results
    4.20 Loose Binding
    4.21 Language Restrictions
    4.22 Input Flexibility
    4.23 Iteration
5 Use cases
    5.1 Extracting MathML
    5.2 Style an XML Document in a Browser
    5.3 Apply a Sequence of Operations
    5.4 Run a Custom Program
    5.5 Service Request/Response Handling on a Handheld
    5.6 XQuery and XSLT 2.0 Collections
    5.7 A Simple Transformation Service
    5.8 An AJAX Server
    5.9 Dynamic XQuery
    5.10 Read/Write Non-XML File
    5.11 Single-file Command-line Document Processing
    5.12 XInclude Processing
    5.13 Document Aggregation
    5.14 Update/Insert Document in Database
    5.15 Content-Dependent Transformations
    5.16 Configuration-Dependent Transformations
    5.17 Response to XML-RPC Request
    5.18 Multiple-file Command-line Document Generation
    5.19 Database Import/Ingestion
    5.20 Metadata Retrieval
    5.21 Non-XML Document Production
    5.22 Parse/Validate/Transform
    5.23 Interact with Web Service
    5.24 Parse and/or Serialize RSS descriptions
    5.25 Integrate Computation Components
    5.26 Document Schema Definition Languages (DSDL) - Part 10: Validation Management
    5.27 Large-Document Subtree Iteration
    5.28 No Use Case

Appendix

A Contributors


1 Introduction

A large and growing set of specifications describe processes operating on XML documents. Many applications will depend on the use of more than one of these specifications. Considering how implementations of these specifications might interact raises many issues related to interoperability. This specification contains requirements on an XML Processing Model and Language for the description of XML process interactions in order to address these issues. This specification is concerned with the conceptual model of XML process interactions, the language for the description of these interactions, and the inputs and outputs of the overall process. This specification is not generally concerned with the implementations of actual XML processes participating in these interactions.

2 Design Principles

The design principles described in this document are a kind of requirement whose compliance with is an overall goal for the specification. It is not necessarily the case that a specific feature meets the requirement. Instead, it should be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principle.

Infoset Processing

Any XML document in this specification is operated on as an information set. Processes may consume or produce information sets to inspect, augment, extract, or produce new informations.

(source: xml core wg)

Technology Neutral

Applications should be free to implement XML processing using appropriate technologies such as SAX, DOM, or other infoset representations.

(source: xml core wg)

Address Practical Interoperability

The language must be rich enough to address practical interoperability concerns.

(source: xml core wg)

Simplicity

The language should be as small and simple as practical.

(source: xml core wg)

Straightforward Core Implementation

It should be relatively easy to implement a conformant implementation of the language but it should also be possible to build a sophisticated implementation that their own optimizations and integrate with other technologies.

(source: xml core wg)

Arbitrary Components

The specifications should allow use of XML-in-XML-out components.

(source: xml core wg)

3 Terminology

Do we want a terminology section where we introduce common terms that exist in current XML pipeline/processing languages?

XML Process Model

An XML process model is an overall set of steps that produces some number of infoset outputs based on some number of infoset inputs.

XML Pipeline

An XML pipeline is a sequence of steps each of whose outputs are chained to the input of the next step.

Specification Language

A specification language is an XML vocabulary in which an XML pipeline or process model is described.

Component

A component is an particular XML technology (e.g. XInclude, XML Schema Validity Assessment, XSLT, XQuery, etc.).

Step

A step is a specification of how to use a component that includes inputs and outputs.

Component Vocabulary

A component vocabulary is the inputs that described the process by which an output is produced (e.g. an XSLT transformation).

Use Environment or Binding

The technology environment in which the xml process is used (e.g. command-line, web servers, editors, browsers, embedded applications, etc.).

4 Requirements

4.1 Allow Control Over Inputs and Outputs Infosets of Steps and XML Process Models

The specification language must allow control over the input and output of each step and the overall process. At minimum, the characteristics of these inputs and outputs must be described so that binding into particular use environments can be accomplished by introspection. This may involve the use of named infoset inputs and outputs.

Supporting use cases:

(source: xml core wg)

4.2 Minimal Input Processing Options

There is a basic minimal set of mandatory input processing options that we must satisfy to achieve interoperability. This includes implicit input provided by the use environment (e.g. a file specified on the command-line) and direct reference by a URI value.

Supporting use cases:

(source: xml core wg)

4.3 XML Pipelining Support

Given a set of components, the specification language must allow order of processing steps to be specified.

Supporting use cases:

(source: xml core wg)

4.4 Input Collection Process Order

Given a set of documents, the specification language must allow order of processing steps to be specified.

Supporting use cases:

(source: xml core wg)

4.5 Allow Optimization of Process Steps

It should also be possible to build a sophisticated implementation that can perform parallel operations, lazy or greedy processing, and other optimizations.

Supporting use cases:

(source: xml core wg)

4.6 New Components and Steps

The model should be extensible enough so that applications can define new process steps that use new components. These definitions should be able to be easily reused in different XML process models.

Supporting use cases:

(source: xml core wg)

4.7 Error Handling and Fall-back

The model and specification language must provide mechanisms for addressing error handling and fall-back behaviors.

Supporting use cases:

(source: xml core wg)

4.8 Conditional Processing

The model should allow conditional processing so that different steps can be selected depending on run-time evaluation(s).

Supporting use cases:

(source: xml core wg)

4.9 Streaming XML Pipelines

The model should not prohibit the existence of streaming pipelines in that a user can write a pipeline that can be streamed. There should be some support for static analysis of an XML pipeline to detect this ability.

Supporting use cases:

(source: xml core wg)

4.10 Multiple Input and Output Support

The model should allow steps that have multiple inputs or produce multiple outputs.

Supporting use cases:

(source: xml core wg)

4.11 Minimal Component Support

The model should allow steps that use the following components:

  • XML Base

  • XInclude

  • XSLT 1.0/2.0

  • XSL FO

  • XML Schema

  • XQuery

  • RelaxNG

Supporting use cases:

(source: xml core wg)

4.12 Data Model Based

What passes between components are conceptually infosets. The specification language and model is not tied to any specific API.

Supporting use cases:

(source: xml core wg)

4.13 An XML Language

The specification language must be an XML vocabulary that can be authored and manipulated using standard XML tools. This language should be able to be reasonably specified by both an XML Schema and RelaxNG schema.

Supporting use cases:

(source: xml core wg)

4.14 Declarative Components and Connections Between Steps

Each component and step in the process model must be expressed in the specification language in a declarative way in that they do not reference specific API or operating system conventions.

Supporting use cases:

(source: xml core wg)

4.15 Language Neutral Implementation

The process model and specification language must be neutral with respect to implementation language. It should be possible to exchange specifications of XML processes across various computing platforms in an interoperable way. While these computing platforms should not be limited to any particular class of platforms such as clients, servers, distributed computing infrastructures, etc., their use environments may create dependencies beyond the scope of this specification (e.g input and output bindings for web services).

Supporting use cases:

(source: xml core wg)

4.16 Interactions Between Stages

The model should define the interaction between processing stages within a pipeline, especially when multiple inputs and outputs are processed.

Supporting use cases:

(source: Rui Lopes)

4.17 Pipeline Composition

Mechanisms for pipeline composition must be provided, whether dependency-based or sequence-based.

Supporting use cases:

(source: Rui Lopes)

4.18 Pipeline Naming

The model should allow pipelines to be explicitly named

Supporting use cases:

(source: Rui Lopes)

4.19 Same Results

Given the same source documents and processing specification, different implementations must return the same results.

Supporting use cases:

(source: Rui Lopes)

4.20 Loose Binding

The language binding to initial processing sources should be as loose as possible to enable batch processing and facilitate reuse.

Supporting use cases:

(source: Rui Lopes)

4.21 Language Restrictions

  • There should be only one standardized element for stage declaration within a pipeline, lowering language specification maintenance costs

  • Processing stage names should be standardized, with strict semantics (i.e., declaring an XSLT task will run a conformant XSLT processor)

Supporting use cases:

(source: Rui Lopes)

4.22 Input Flexibility

The language should allow processes and components to accept zero, one or multiple inputs and zero, one or multiple outputs.

Supporting use cases:

(source: Alessandro Vernet)

4.23 Iteration

The language should allow iteration of documents and elements within documents. This iteration should allow application of additional processes to the target of the iteration.

Supporting use cases:

(source: Alessandro Vernet)

5 Use cases

5.1 Extracting MathML

Extract MathML fragments from an XHTML document and render them as images. Employ an SVG renderer for SVG glyphs embedded in the MathML.

(source: xml core wg)

5.2 Style an XML Document in a Browser

Style an XML document in a browser with one of several different stylesheets without having multiple copies of the document containing different xml-stylesheet directives.

(source: xml core wg)

5.3 Apply a Sequence of Operations

Apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result or an intermediate stage is not valid.

(source: xml core wg)

5.4 Run a Custom Program

Run a program of your own, with some parameters, on an XML file and display the result in a browser.

(source: xml core wg)

5.5 Service Request/Response Handling on a Handheld

Allow an application on a handheld device to construct a pipeline, send the pipeline and some data to the server, allow the server to process the pipeline and send the result back.

(source: xml core wg)

5.6 XQuery and XSLT 2.0 Collections

In XQuery and XSLT 2.0 there is the idea of an input and output collection.

5.7 A Simple Transformation Service

  1. Extract XML document (XForms instance) from an HTTP request body

  2. Execute XSLT transformation on that document.

  3. Call a persistence service with resulting document

  4. Return the XML document from persistence service (new XForms instance) as the HTTP response body.

(source: Erik Bruchez)

5.8 An AJAX Server

  1. Receive XML request with word to complete.

  2. Call a sub-pipeline that retrieves list of completions for that word.

  3. Format resulting document with XSLT.

  4. Serialize response to XML.

(source: Erik Bruchez)

5.9 Dynamic XQuery

  1. Dynamically create an XQuery query using XSLT, based on input XML document.

  2. Execute the XQuery against a database.

  3. Construct an XHTML result page using XSLT from the result of the query.

  4. Serialize response to HTML.

(source: Erik Bruchez)

5.10 Read/Write Non-XML File

  1. Read a CSV file and convert it to XML.

  2. Process the document with XSLT.

  3. Convert the result to a CSV format using text serialization.

(source: Erik Bruchez)

5.11 Single-file Command-line Document Processing

  1. Read a DocBook document.

  2. Validate the document.

  3. Process it with XSLT.

  4. Validate the resulting XHTML.

  5. Save the HTML file using HTML serialization.

(source: Erik Bruchez)

5.12 XInclude Processing

  1. Retrieve a document containing XInclude instructions.

  2. Locate documents to be included.

  3. Perform XInclude inclusion.

  4. Return a single XML document.

(source: Erik Bruchez)

5.13 Document Aggregation

  1. Locate a collection of documents to aggregate.

  2. Perform aggregation under a new document element.

  3. Return a single XML document.

(source: Erik Bruchez)

5.14 Update/Insert Document in Database

  1. Receive an XML document to save.

  2. Check the database to see if the document exists.

  3. If the document exists, update the document.

  4. If the document does not exists, add the document.

(source: Erik Bruchez)

5.15 Content-Dependent Transformations

  1. Receive an XML document to format.

  2. If the document is XHTML, apply a theme via XSLT and serialize as HTML.

  3. If the document is XSL-FO, apply an XSL FO processor to produce PDF.

  4. Otherwise, serialize the document as XML.

(source: Erik Bruchez)

5.16 Configuration-Dependent Transformations

Mobile example:

  1. Receive an XML document to format.

  2. If the configuration is "desktop browser", apply desktop XSLT and serialize as HTML.

  3. If the configuration is "mobile browser", apply mobile XSLT and serialize as XHTML.

News feed example:

  1. Receive an XML document in Atom format.

  2. If the configuration is "RSS 1.0", apply "Atom to RSS 1.0" XSLT.

  3. If the configuration is "RSS 2.0", apply "Atom to RSS 2.0" XSLT.

  4. Serialize the document as XML.

(source: Erik Bruchez)

5.17 Response to XML-RPC Request

  1. Receive an XML-RPC request.

  2. Validate the XML-RPC request with a RelaxNG schema.

  3. Dispatch to different sub-pipelines depending on the content of /methodCall/methodName.

  4. Format the sub-pipeline response to XML-RPC format via XSLT.

  5. Validate the XML-RPC response with an W3C XML Schema.

  6. Return the XML-RPC response.

(source: Erik Bruchez)

5.18 Multiple-file Command-line Document Generation

  1. Read a list of source documents.

  2. For each document in the list:

    1. Read the document.

    2. Perform a series of XSLT transformations.

    3. Serialize each result.

  3. Alternatively, aggregate the resulting documents and serialize a single result.

(source: Erik Bruchez)

5.19 Database Import/Ingestion

Import example:

  1. Read a list of source documents.

  2. For each document in the list:

    1. Validate the document.

    2. Call a sub-pipeline to insert content into a relational or XML database.

Ingestion example:

  1. Receive a directory name.

  2. Produce a list of files in the directory as an XML document.

  3. For each element representing a file:

    1. Create an iTQL query using XSLT.

    2. Query the repository to check if the file has been uploaded.

    3. Upload if necessary.

    4. Inspect the file to check the metadata type.

    5. Transform the document with XSLT.

    6. Make a SOAP call to ingest the document.

(source: Erik Bruchez)

5.20 Metadata Retrieval

  1. Call a SOAP service with metadata format as a parameter.

  2. Create an iTQL query with XSLT.

  3. Query a repository for the XML document.

  4. Load a list of XSLT transformations from a configuration.

  5. Iteratively execute the XSLT transformations.

  6. Serialize the result to XML.

(source: Erik Bruchez)

5.21 Non-XML Document Production

  1. An non-XML document is fed into the process.

  2. That input is converted into a well-formed XML document.

  3. A table of contents is extracted.

  4. Pagination is performed.

  5. Each page is transformed into some output language.

(source: Rui Lopes)

  1. Read a non-XML document.

  2. Transform.

(source: Norm Walsh)

5.22 Parse/Validate/Transform

  1. Parse the XML.

  2. Perform XInclude.

  3. Validate with Relax NG.

  4. Validate with W3C XML Schema.

  5. Transform.

(source: Norm Walsh)

5.23 Interact with Web Service

  1. Parse the XML.

  2. Construct a URL request to a REST-style web service.

  3. Parse the resulting as HTML with fix-up to XHTML (e.g. use TagSoup parser).

  4. Extract a table data from document by applying a regular expression and creating markup from the matches.

  5. Use XQuery to select the high and low tides.

  6. Formulate response.

(source: Alex Milowski)

5.24 Parse and/or Serialize RSS descriptions

Parse descriptions:

  1. Iterate over the RSS description elements and do the following:

    1. Gather the text children of the 'description' element.

    2. Parse the contents with a simulated document element in the XHTML namespace.

    3. Send the resulting children as the children of the 'description element.

  2. Apply rest of pipeline steps.

Serialize descriptions

  1. Iterate over the RSS description elements and do the following:

    1. Serialize the children elements.

    2. Generate a new child as a text children containing the contents (escaped text).

  2. Apply rest of pipeline steps.

(source: Alex Milowski)

5.25 Integrate Computation Components

  1. Select a particular subtree to scope a sub-component.

  2. Apply a computation component that processes some MathML or other math-oriented XML to perform a computation.

  3. Replace the input subtree with the output of the computation as XML content.

(source: Alex Milowski)

5.26 Document Schema Definition Languages (DSDL) - Part 10: Validation Management

  1. Select a particular subtree to scope a sub-component.

  2. Apply a computation component that processes some MathML or other math-oriented XML to perform a computation.

  3. Replace the input subtree with the output of the computation as XML content.

(source: Martin Bryan)

5.27 Large-Document Subtree Iteration

Running XSLT on a very large document isn't typically practical. In these cases, it is often the case that a particular element, that may be repeated over-and-over again, needs to be transformed. Conceptually, a pipeline could limit the transformation to a subtree of this document, possibly identified by an XPath, to allow the transformation to run in constant memory. The non-matches of the document can then be streamed around the transformations allowing low-use of memory.

(source: Alex Milowski)

5.28 No Use Case

This is a place holder. Once the association between requirement and use case is finished, this will go away.

A Contributors

The following members of the XML Core Working Group contributed to this specification as part of their requirements document effort within that working group: