This document is also available in these non-normative formats: XML.
Copyright © 2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document contains requirements for the development of XML Processing Model and Language, which are intended to describe and specify the processing relationships between XML resources.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a Working Group Note of the Requirements Document for an XML Processing Model and Language for describing an interoperable way for applications to describe the order in which processes should be applied to XML documents.
This document has been produced by the W3C XML Processing Model Group as part of the XML Activity and is an continuation of the work done by the XML Core Working Group. This document supersedes their requirement document note.
Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
As of this publication, the Working Group expects to eventually publish this document as a Working Group Note. It is not expected to become a W3C Recommendation, and therefore it has no associated W3C Patent Policy licensing obligations.
1 Introduction
2 Design Principles
3 Terminology
4 Requirements
4.1 Allow Control Over Inputs and Outputs Infosets of Steps and XML
Process Models
4.2 Minimal Input Processing Options
4.3 XML Pipelining Support
4.4 Input Collection Process Order
4.5 Allow Optimization of Process Steps
4.6 New Components and Steps
4.7 Error Handling and Fall-back
4.8 Conditional Processing
4.9 Streaming XML Pipelines
4.10 Multiple Input and Output Support
4.11 Minimal Component Support
4.12 Data Model Based
4.13 An XML Language
4.14 Declarative Components and Connections Between Steps
4.15 Language Neutral Implementation
4.16 Interactions Between Stages
4.17 Pipeline Composition
4.18 Pipeline Naming
4.19 Same Results
4.20 Loose Binding
4.21 Language Restrictions
4.22 Input Flexibility
4.23 Iteration
5 Use cases
5.1 Extracting MathML
5.2 Style an XML Document in a Browser
5.3 Apply a Sequence of Operations
5.4 Run a Custom Program
5.5 Service Request/Response Handling on a Handheld
5.6 XQuery and XSLT 2.0 Collections
5.7 A Simple Transformation Service
5.8 An AJAX Server
5.9 Dynamic XQuery
5.10 Read/Write Non-XML File
5.11 Single-file Command-line Document Processing
5.12 XInclude Processing
5.13 Document Aggregation
5.14 Update/Insert Document in Database
5.15 Content-Dependent Transformations
5.16 Configuration-Dependent Transformations
5.17 Response to XML-RPC Request
5.18 Multiple-file Command-line Document Generation
5.19 Database Import/Ingestion
5.20 Metadata Retrieval
5.21 Non-XML Document Production
5.22 Parse/Validate/Transform
5.23 Interact with Web Service
5.24 Parse and/or Serialize RSS descriptions
5.25 Integrate Computation Components
5.26 Document Schema Definition Languages (DSDL) - Part 10:
Validation Management
5.27 Large-Document Subtree Iteration
5.28 No Use Case
A large and growing set of specifications describe processes operating on XML documents. Many applications will depend on the use of more than one of these specifications. Considering how implementations of these specifications might interact raises many issues related to interoperability. This specification contains requirements on an XML Processing Model and Language for the description of XML process interactions in order to address these issues. This specification is concerned with the conceptual model of XML process interactions, the language for the description of these interactions, and the inputs and outputs of the overall process. This specification is not generally concerned with the implementations of actual XML processes participating in these interactions.
The design principles described in this document are a kind of requirement whose compliance with is an overall goal for the specification. It is not necessarily the case that a specific feature meets the requirement. Instead, it should be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principle.
Any XML document in this specification is operated on as an information set. Processes may consume or produce information sets to inspect, augment, extract, or produce new informations.
Applications should be free to implement XML processing using appropriate technologies such as SAX, DOM, or other infoset representations.
The language must be rich enough to address practical interoperability concerns.
The language should be as small and simple as practical.
It should be relatively easy to implement a conformant implementation of the language but it should also be possible to build a sophisticated implementation that their own optimizations and integrate with other technologies.
The specifications should allow use of XML-in-XML-out components.
Do we want a terminology section where we introduce common terms that exist in current XML pipeline/processing languages?
An XML process model is an overall set of steps that produces some number of infoset outputs based on some number of infoset inputs.
An XML pipeline is a sequence of steps each of whose outputs are chained to the input of the next step.
A specification language is an XML vocabulary in which an XML pipeline or process model is described.
A component is an particular XML technology (e.g. XInclude, XML Schema Validity Assessment, XSLT, XQuery, etc.).
A step is a specification of how to use a component that includes inputs and outputs.
A component vocabulary is the inputs that described the process by which an output is produced (e.g. an XSLT transformation).
The technology environment in which the xml process is used (e.g. command-line, web servers, editors, browsers, embedded applications, etc.).
The specification language must allow control over the input and output of each step and the overall process. At minimum, the characteristics of these inputs and outputs must be described so that binding into particular use environments can be accomplished by introspection. This may involve the use of named infoset inputs and outputs.
Supporting use cases:
There is a basic minimal set of mandatory input processing options that we must satisfy to achieve interoperability. This includes implicit input provided by the use environment (e.g. a file specified on the command-line) and direct reference by a URI value.
Supporting use cases:
Given a set of components, the specification language must allow order of processing steps to be specified.
Supporting use cases:
Given a set of documents, the specification language must allow order of processing steps to be specified.
Supporting use cases:
It should also be possible to build a sophisticated implementation that can perform parallel operations, lazy or greedy processing, and other optimizations.
Supporting use cases:
The model should be extensible enough so that applications can define new process steps that use new components. These definitions should be able to be easily reused in different XML process models.
Supporting use cases:
The model and specification language must provide mechanisms for addressing error handling and fall-back behaviors.
Supporting use cases:
The model should allow conditional processing so that different steps can be selected depending on run-time evaluation(s).
Supporting use cases:
The model should not prohibit the existence of streaming pipelines in that a user can write a pipeline that can be streamed. There should be some support for static analysis of an XML pipeline to detect this ability.
Supporting use cases:
The model should allow steps that have multiple inputs or produce multiple outputs.
Supporting use cases:
The model should allow steps that use the following components:
XML Base
XInclude
XSLT 1.0/2.0
XSL FO
XML Schema
XQuery
RelaxNG
Supporting use cases:
What passes between components are conceptually infosets. The specification language and model is not tied to any specific API.
Supporting use cases:
The specification language must be an XML vocabulary that can be authored and manipulated using standard XML tools. This language should be able to be reasonably specified by both an XML Schema and RelaxNG schema.
Supporting use cases:
Each component and step in the process model must be expressed in the specification language in a declarative way in that they do not reference specific API or operating system conventions.
Supporting use cases:
The process model and specification language must be neutral with respect to implementation language. It should be possible to exchange specifications of XML processes across various computing platforms in an interoperable way. While these computing platforms should not be limited to any particular class of platforms such as clients, servers, distributed computing infrastructures, etc., their use environments may create dependencies beyond the scope of this specification (e.g input and output bindings for web services).
Supporting use cases:
The model should define the interaction between processing stages within a pipeline, especially when multiple inputs and outputs are processed.
Supporting use cases:
Mechanisms for pipeline composition must be provided, whether dependency-based or sequence-based.
Supporting use cases:
Given the same source documents and processing specification, different implementations must return the same results.
Supporting use cases:
The language binding to initial processing sources should be as loose as possible to enable batch processing and facilitate reuse.
Supporting use cases:
There should be only one standardized element for stage declaration within a pipeline, lowering language specification maintenance costs
Processing stage names should be standardized, with strict semantics (i.e., declaring an XSLT task will run a conformant XSLT processor)
Supporting use cases:
The language should allow processes and components to accept zero, one or multiple inputs and zero, one or multiple outputs.
Supporting use cases:
Extract MathML fragments from an XHTML document and render them as images. Employ an SVG renderer for SVG glyphs embedded in the MathML.
Style an XML document in a browser with one of several different stylesheets without having multiple copies of the document containing different xml-stylesheet directives.
Apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result or an intermediate stage is not valid.
Run a program of your own, with some parameters, on an XML file and display the result in a browser.
Allow an application on a handheld device to construct a pipeline, send the pipeline and some data to the server, allow the server to process the pipeline and send the result back.
In XQuery and XSLT 2.0 there is the idea of an input and output collection.
Extract XML document (XForms instance) from an HTTP request body
Execute XSLT transformation on that document.
Call a persistence service with resulting document
Return the XML document from persistence service (new XForms instance) as the HTTP response body.
Receive XML request with word to complete.
Call a sub-pipeline that retrieves list of completions for that word.
Format resulting document with XSLT.
Serialize response to XML.
Dynamically create an XQuery query using XSLT, based on input XML document.
Execute the XQuery against a database.
Construct an XHTML result page using XSLT from the result of the query.
Serialize response to HTML.
Read a CSV file and convert it to XML.
Process the document with XSLT.
Convert the result to a CSV format using text serialization.
Read a DocBook document.
Validate the document.
Process it with XSLT.
Validate the resulting XHTML.
Save the HTML file using HTML serialization.
Retrieve a document containing XInclude instructions.
Locate documents to be included.
Perform XInclude inclusion.
Return a single XML document.
Locate a collection of documents to aggregate.
Perform aggregation under a new document element.
Return a single XML document.
Receive an XML document to save.
Check the database to see if the document exists.
If the document exists, update the document.
If the document does not exists, add the document.
Receive an XML document to format.
If the document is XHTML, apply a theme via XSLT and serialize as HTML.
If the document is XSL-FO, apply an XSL FO processor to produce PDF.
Otherwise, serialize the document as XML.
Mobile example:
Receive an XML document to format.
If the configuration is "desktop browser", apply desktop XSLT and serialize as HTML.
If the configuration is "mobile browser", apply mobile XSLT and serialize as XHTML.
News feed example:
Receive an XML document in Atom format.
If the configuration is "RSS 1.0", apply "Atom to RSS 1.0" XSLT.
If the configuration is "RSS 2.0", apply "Atom to RSS 2.0" XSLT.
Serialize the document as XML.
Receive an XML-RPC request.
Validate the XML-RPC request with a RelaxNG schema.
Dispatch to different sub-pipelines depending on the content of /methodCall/methodName.
Format the sub-pipeline response to XML-RPC format via XSLT.
Validate the XML-RPC response with an W3C XML Schema.
Return the XML-RPC response.
Read a list of source documents.
For each document in the list:
Read the document.
Perform a series of XSLT transformations.
Serialize each result.
Alternatively, aggregate the resulting documents and serialize a single result.
Import example:
Read a list of source documents.
For each document in the list:
Validate the document.
Call a sub-pipeline to insert content into a relational or XML database.
Ingestion example:
Receive a directory name.
Produce a list of files in the directory as an XML document.
For each element representing a file:
Create an iTQL query using XSLT.
Query the repository to check if the file has been uploaded.
Upload if necessary.
Inspect the file to check the metadata type.
Transform the document with XSLT.
Make a SOAP call to ingest the document.
Call a SOAP service with metadata format as a parameter.
Create an iTQL query with XSLT.
Query a repository for the XML document.
Load a list of XSLT transformations from a configuration.
Iteratively execute the XSLT transformations.
Serialize the result to XML.
An non-XML document is fed into the process.
That input is converted into a well-formed XML document.
A table of contents is extracted.
Pagination is performed.
Each page is transformed into some output language.
Read a non-XML document.
Transform.
Parse the XML.
Perform XInclude.
Validate with Relax NG.
Validate with W3C XML Schema.
Transform.
Parse the XML.
Construct a URL request to a REST-style web service.
Parse the resulting as HTML with fix-up to XHTML (e.g. use TagSoup parser).
Extract a table data from document by applying a regular expression and creating markup from the matches.
Use XQuery to select the high and low tides.
Formulate response.
Parse descriptions:
Iterate over the RSS description elements and do the following:
Gather the text children of the 'description' element.
Parse the contents with a simulated document element in the XHTML namespace.
Send the resulting children as the children of the 'description element.
Apply rest of pipeline steps.
Serialize descriptions
Iterate over the RSS description elements and do the following:
Serialize the children elements.
Generate a new child as a text children containing the contents (escaped text).
Apply rest of pipeline steps.
Select a particular subtree to scope a sub-component.
Apply a computation component that processes some MathML or other math-oriented XML to perform a computation.
Replace the input subtree with the output of the computation as XML content.
Select a particular subtree to scope a sub-component.
Apply a computation component that processes some MathML or other math-oriented XML to perform a computation.
Replace the input subtree with the output of the computation as XML content.
Running XSLT on a very large document isn't typically practical. In these cases, it is often the case that a particular element, that may be repeated over-and-over again, needs to be transformed. Conceptually, a pipeline could limit the transformation to a subtree of this document, possibly identified by an XPath, to allow the transformation to run in constant memory. The non-matches of the document can then be streamed around the transformations allowing low-use of memory.