The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.


W3C

XProc V.next Requirements and Use Cases

Editors' Working Draft 10 April 2012

Editors:
Alex Milowski, Invited Expert <alex@milowski.com>
Murray Maloney, Invited Expert <murray@muzmo.com>

This document is also available in these non-normative formats: XML.


Abstract

This document is being buiit to articulate requirements for the development of a version of XProc: An XML Pipeline Language subsequent to the one currently making its way through the W3C Candidate Recommendation process.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This Editor's Working Draft has been produced by the authors listed above, at the will of the chair, and with the consent of the members W3C XML Processing Model Working Group as part of the XML Activity, following the procedures set out for the W3C Process. The goals of the XML Processing Model Working Group are discussed in its charter.

Comments on this document should be sent to the W3C mailing list public-xml-processing-model-comments@w3.org (archive).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 XProc V.next Goals
2 Terminology
3 Design Principles
4 Requirements
    4.1 Standard Names for Component Inventory
    4.2 Allow Defining New Components and Steps
    4.3 Minimal Component Support for Interoperability
    4.4 Allow Pipeline Composition
    4.5 Iteration of Documents and Elements
    4.6 Conditional Processing of Inputs
    4.7 Error Handling and Fall-back
    4.8 Support for the XPath 2.0 Data Model
    4.9 Allow Optimization
    4.10 Streaming XML Pipelines
5 Use cases
    5.1 Apply a Sequence of Operations
    5.2 XInclude Processing
    5.3 Parse/Validate/Transform
    5.4 Document Aggregation
    5.5 Single-file Command-line Document Processing
    5.6 Multiple-file Command-line Document Generation
    5.7 Extracting MathML
    5.8 Style an XML Document in a Browser
    5.9 Run a Custom Program
    5.10 XInclude and Sign
    5.11 Make Absolute URLs
    5.12 A Simple Transformation Service
    5.13 Service Request/Response Handling on a Handheld
    5.14 Interact with Web Service (Tide Information)
    5.15 Parse and/or Serialize RSS descriptions
    5.16 XQuery and XSLT 2.0 Collections
    5.17 An AJAX Server
    5.18 Dynamic XQuery
    5.19 Read/Write Non-XML File
    5.20 Update/Insert Document in Database
    5.21 Content-Dependent Transformations
    5.22 Configuration-Dependent Transformations
    5.23 Response to XML-RPC Request
    5.24 Database Import/Ingestion
    5.25 Metadata Retrieval
    5.26 Non-XML Document Production
    5.27 Integrate Computation Components (MathML)
    5.28 Document Schema Definition Languages (DSDL) - Part 10: Validation Management
    5.29 Large-Document Subtree Iteration
    5.30 Adding Navigation to an Arbitrarily Large Document
    5.31 Fallback to Choice of XSLT Processor
    5.32 No Fallback for XQuery Causes Error

Appendices

A References
    A.1 Reference Documents
    A.2 Core XML Specifications
    A.3 XML Data Model and XML Information Set
    A.4 XPath and XQuery
    A.5 Style, Transform, Serialize
    A.6 XML Schema Languages
    A.7 Identifiers and Names
    A.8 HTTP Request & Authentication
    A.9 Character Encodings
    A.10 Media Types
    A.11 Digital Signatures
    A.12 Candidate Specifications: XPointers
    A.13 Candidate Specification: XLink
    A.14 Candidate Specification: Mathematics
    A.15 Candidate Specification: XForms
    A.16 Candidate Specification: EXI
    A.17 Candidate Specifications: HTML-ish
    A.18 Candidate Specifications: Popular Publishing Profiles
    A.19 B2B Transaction Language Profile
    A.20 Candidate Specifications: Digital Signatures and Encryption
    A.21 Candidate Specifications: Semantic Web
    A.22 Candidate Non-XML Data Format Specifications
    A.23 Reference Processors?
B Unsatisfied V1 CR Issues
C Unsatisfied V1 Requirements and Use Cases
D Collected Input
    D.1 Architecture
        D.1.1 What Flows?
            D.1.1.1 Sequences
            D.1.1.2 Sets of Documents
            D.1.1.3 MetaData, HTML5, JSON, Plain Text
        D.1.2 Events
        D.1.3 Synchronization & Concurrency
    D.2 Resource Management
        D.2.1 Add a Resource Manager
        D.2.2 Dynamic pipeline execution
        D.2.3 Information caches
        D.2.4 Environment
        D.2.5 Datatypes
    D.3 Integration
        D.3.1 XML Choreography
        D.3.2 Authentication
        D.3.3 Debugging
        D.3.4 Fall-back Mechanism
        D.3.5 Clustering
    D.4 Usability
        D.4.1 Verbosity
            D.4.1.1 p:data
            D.4.1.2 p:input
            D.4.1.3 p:option
            D.4.1.4 p:pipe
            D.4.1.5 p:serialization
            D.4.1.6 p:store
            D.4.1.7 p:string-replace
            D.4.1.8 p:template
            D.4.1.9 p:try
            D.4.1.10 p:variable
            D.4.1.11 p:viewport
        D.4.2 Parameter Rules
        D.4.3 Choose-style binding
        D.4.4 Output signatures for compound steps
        D.4.5 Loading computed URIs
        D.4.6 Optional options for declared steps
        D.4.7 XPath
        D.4.8 Simplify Use of File Sets
        D.4.9 Required Primary Port
        D.4.10 Documentation Conventions
    D.5 New Steps
        D.5.1 Various Suggestions
        D.5.2 Iterate until condition
        D.5.3 p:send-mail
E FYI: Categorized Steps
    E.1 Micro-operations
    E.2 Transformation
    E.3 Query
    E.4 Validation
    E.5 Document Operations
    E.6 File & Directory Operations
    E.7 Image Operations
    E.8 Error / Message Handling
    E.9 Sequence Operations
    E.10 Input / Output
    E.11 XProc Operations
    E.12 Encoding
    E.13 Execution Control
    E.14 Resource / Collection Management
    E.15 Environment
    E.16 Miscellaneous
F Contributors


1 Introduction

A large and growing set of specifications describe processes operating on XML documents. Many applications depend on the use of more than one of the many inter-related XML family of specifications. How implementations of these specifications interact affects interoperability. XProc: An XML Pipeline Language is designed for describing operations to be performed on XML documents.

This specification contains requirements for an anticipated XProc V.next. This specification is concerned with the conceptual model of XML process interactions, the language for the description of these interactions, and the inputs and outputs of the overall process. This specification is not generally concerned with the implementations of actual XML processes participating in these interactions.

1.1 XProc V.next Goals

Editorial note 
The editors intend to enumerate the Working Group's goals for XProc V.next to guide our efforts, and these may ultimately inform 3 Design Principles.

The following is a strawman list; it has no standing with the Working Group and is likely to be replaced and/or expanded daily until further notice.

  • Review Definitions. Section 2.

  • Review Design Principles. Section 3.

  • Gather and review Related Specifications. Appendix A.

  • Gather and review outstanding issues. Appendix B.

  • Audit existing requirements and use cases. Appendix C

  • Gather and review collected input. Appendix D

  • Gather and review categorized list of extant steps. Appendix E

  • Update existing definitions, design principles, requirements and use cases.

  • Gather and review input from stakeholders under the following categories. See Appendix D.

    • Architecture

    • Resource Management.

    • Integration.

    • Usability.

    • New Steps

  • Discuss.

  • Enumerate new design principles, requirements and use cases.

  • Publish.

2 Terminology

[Definition: XML Information Set or "Infoset"]

An XML Information Set or "Infoset" is the name we give to any implementation of a data model for XML which supports the vocabulary as defined by the XML Information Set recommendation [xml-infoset-rec].

[Definition: XML Pipeline]

An XML Pipeline is a conceptualization of a flow of a configuration of steps and their parameters. The XML Pipeline defines a process in terms of order, dependencies, or iteration of steps over XML information sets.

[Definition: XML Pipeline Specification Document]

A pipeline specification document is an XML document that described an XML pipeline.

[Definition: Step]

A step is a specification of how a component is used in a pipeline that includes inputs, outputs, and parameters.

[Definition: Component]

A component is an particular XML technology (e.g. XInclude, XML Schema Validity Assessment, XSLT, XQuery, etc.).

[Definition: Input Document]

An XML infoset that is an input to a XML Pipeline or Step.

[Definition: Output Document]

The result of processing by an XML Pipeline or Step.

[Definition: Parameter]

A parameter is input to a Step or an XML Pipeline in addition to the Input and Output Document(s) that it may access. Parameters are most often simple, scalar values such as integers, booleans, and URIs, and they are most often named, but neither of these conditions is mandatory. That is, we do not (at this time) constrain the range of values a parameter may hold, nor do we (at this time) forbid a Step from accepting anonymous parameters.

[Definition: XML Pipeline Environment]

The technology or platform environment in which the XML Pipeline is used (e.g. command-line, web servers, editors, browsers, embedded applications, etc.).

[Definition: Streaming]

The ability to parse an XML document and pass infoitems between components without building a full document information set.

3 Design Principles

The design principles described in this document are requirements whose compliance with is an overall goal for the specification. It is not necessarily the case that a specific feature meets the requirement. Instead, it should be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principle.

Technology Neutral

Applications should be free to implement XML processing using appropriate technologies such as SAX, DOM, or other infoset representations.

Platform Neutral

Application computing platforms should not be limited to any particular class of platforms such as clients, servers, distributed computing infrastructures, etc. In addition, the resulting specifications should not be swayed by the specifics of use in those platform.

Small and Simple

The language should be as small and simple as practical. It should be "small" in the sense that simple processing should be able to stated in a compact way and "simple" in the sense the specification of more complex processing steps do not require arduous specification steps in the XML Pipeline Specification Document.

Infoset Processing

At a minimum, an XML document is represented and manipulated as an XML Information Set. The use of supersets, augmented information sets, or data models that can be represented or conceptualized as information sets should be allowed, and in some instances, encouraged (e.g. for the XPath 2.0 Data Model).

Straightforward Core Implementation

It should be relatively easy to implement a conforming implementation of the language but it should also be possible to build a sophisticated implementation that implements its own optimizations and integrates with other technologies.

Address Practical Interoperability

An XML Pipeline must be able to be exchanged between different software systems with a minimum expectation of the same result for the pipeline given that the XML Pipeline Environment is the same. A reasonable resolution to platform differences for binding or serialization of resulting infosets should be expected to be address by this specification or by re-use of existing specifications.

Validation of XML Pipeline Documents by a Schema

The XML Pipeline Specification Document should be able to be validated by both W3C XML Schema and RelaxNG.

Reuse and Support for Existing Specifications

XML Pipelines need to support existing XML specifications and reuse common design patterns from within them. In addition, there must be support for the use of future specifications as much as possible.

Arbitrary Components

The specification should allow use of any component technology that can consume or produce XML Information Sets.

Control of Inputs and Outputs

An XML Pipeline must allow control over specifying both the inputs and outputs of any process within the pipeline. This applies to the inputs and outputs of both the XML Pipeline and its containing steps. It should also allow for the case where there might be multiple inputs and outputs.

Control of Flow and Errors

An XML Pipeline must allow control the explicit and implicit handling of the flow of documents between steps. When errors occur, these must be able to be handled explicitly to allow alternate courses of action within the XML Pipeline.

4 Requirements

4.1 Standard Names for Component Inventory

The XML Pipeline Specification Document must have standard names for components that correspond, but not limited to, the following specifications [xml-core-wg]:

Editorial note: Satisfied20120407
This requirement is satisfied by XProc 1.0. Refer to D Pipeline Language Summary. The following are code snippets which exemplify related XProc 1.0 step names.
<p:xinclude ... /> 
<p:xslt name="make-fo" ... > 
<p:xsl-formatter ... />
<p:validate-with-xml-schema name="validated" ... > 
<p:validate-with-relax-ng ... />
<p:xquery ... /> 
Editorial note: Supported20120407
Editors have added the following list to reflect normatively referenced specifications in XProc 1.0.

The following list is proposed as added requirements on the basis that they have been accomplished in XProc 1.0, as reflected in the related example. Moreover, these specifications are cited normatively in XProc 1.0.

<p:namespaces ... />
<p:namespace-rename .... /> 
<p:validate-with-schematron ... /> 
<p:xpath-context .... />
<p:xpath-version .... />
<p:xpath-version-available ... /> 
<c:http-request" ... />   

The following list is proposed as added requirements on the basis these specifications have been stipulated by reference in XProc 1.0.

4.2 Allow Defining New Components and Steps

An XML Pipeline must allow applications to define and share new steps that use new or existing components. [xml-core-wg]

Editorial note: Satisfied20120407
Defining and sharing steps is satisfied by XProc 1.0; see the Introduction. The ability to define additional step types is Implementation-defined. See the list of reference implementations for examples of components and steps that are defined ex-cathedra.

4.3 Minimal Component Support for Interoperability

There must be a minimal inventory of components defined by the specification that are required to be supported to facilitate interoperability of XML Pipelines.

Editorial note: Minimal is Moot20120407
Failing a definition of "minimal inventory" this requirement cannot be satisfied. Is this requiring a minimal number of components to satisfy interoperability among only those components listed in 4.1? In that case, we need to work on 4.1. Otherwise, the editors consider this requirement to be self-evidently moot; "minimal inventory" cannot be defined or measured to the satisfaction of any three people.

4.4 Allow Pipeline Composition

Mechanisms for XML Pipeline composition for re-use or re-purposing must be provided within the XML Pipeline Specification Document.

Editorial note: Satisfied20120407
This requirement is satisfied by XProc 1.0, Example 1. There may be room for improvement. reflected in the Collected Input.

4.5 Iteration of Documents and Elements

XML Pipelines should allow iteration of a specific set of steps over a collection of documents and or elements within a document.

Editorial note: Satisfied20120407
This requirement is satisfied by XProc 1.0, Example 3. The term "specific set" was never defined. We should probably do that now, in the corrigendum. We have opportunity to provide advice here, again and again.

4.6 Conditional Processing of Inputs

To allow run-time selection of steps, XML Pipelines should provide mechanisms for conditional processing of documents or elements within documents based on expression evaluation. [xml-core-wg]

Editorial note: Satisfied20120407
This requirement is satisfied by XProc 1.0, Example 3. What principles have we learned about conditional processing of inputs?

4.7 Error Handling and Fall-back

XML Pipelines must provide mechanisms for addressing error handling and fall-back behaviors. [xml-core-wg]

Editorial note: Satisfied20120407
This requirement is satisfied by XProc: All steps have an implicit output port for reporting errors; error handling and fallback are manageable through use of p:try. and p:catch.

<p:try name? = NCName> (p:variable*, p:group, p:catch) </p:try>

4.8 Support for the XPath 2.0 Data Model

XML Pipelines must support the XPath 2.0 Data Model to allow support for XPath 2.0, XSLT 2.0, and XQuery as steps.

Note:

At this point, there is no consensus in the working group that minimal conforming implementations are required to support the XPath 2.0 Data Model.

Editorial note: Satisfied20120407
This requirement is satisfied in XProc 1.0, with the caveats noted in 2.6 XPaths in XProc.

4.9 Allow Optimization

An XML Pipeline should not inhibit a sophisticated implementation from performing parallel operations, lazy or greedy processing, and other optimizations. [xml-core-wg]

Editorial note: Satisfied20120407
This requirement is satisfied XProc 1.0, with the caveats noted in H Sequential steps, parallelism, and side-effects

4.10 Streaming XML Pipelines

An XML Pipeline should allow for the existence of streaming pipelines in certain instances as an optional optimization. [xml-core-wg]

Editorial note: Satisfied20120407
This requirement is more than satisfied by XProc 1.0, except as noted for 7.1.23 p:split-sequence

5 Use cases

This section contains a set of use cases that supported our requirements and informed our design. While there was a want to address all the use cases listed in this document, in the end, the first version may not have solved all the following use cases. Those unsolved use cases may be migrated into XProc V.next.

5.1 Apply a Sequence of Operations

Apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result or an intermediate stage is not valid.

(source: [xml-core-wg])

Editorial note: Satisfied20120407
This use case is satisfied by Examples 1-3 in the Introduction

5.2 XInclude Processing

  1. Retrieve a document containing XInclude instructions.

  2. Locate documents to be included.

  3. Perform XInclude inclusion.

  4. Return a single XML document.

(source: Erik Bruchez)

Editorial note: Satisfied20120407
This use case is satisfied by Examples 1-3 in the Introduction

5.3 Parse/Validate/Transform

  1. Parse the XML.

  2. Perform XInclude.

  3. Validate with Relax NG, possibly aborting if not valid.

  4. Validate with W3C XML Schema, possibly aborting if not valid.

  5. Transform.

(source: Norm Walsh)

Editorial note: Satisfied20120407
This use case is almost satisfied by Examples 1-3 in the Introduction. The example does not include Relax NG validation, but it could have, and Schematron as well.

5.4 Document Aggregation

  1. Locate a collection of documents to aggregate.

  2. Perform aggregation under a new document element.

  3. Return a single XML document.

(source: Erik Bruchez)

Editorial note: Satisfied20120407
This use case is satisfied by XProc 1.0 as exemplified in p:for-each

5.5 Single-file Command-line Document Processing

  1. Read a DocBook document.

  2. Validate the document.

  3. Process it with XSLT.

  4. Validate the resulting XHTML.

  5. Save the HTML file using HTML serialization.

(source: Erik Bruchez)

Editorial note: Satisfied20120407
Although the processing scenario described above is exemplified in p:for-each, the command-line requirement is considered to be implementation defined. ["How outside values are specified for pipeline parameters on the pipeline initially invoked by the processor is implementation-defined. In other words, the command line options, APIs, or other mechanisms available to specify such parameter values are outside the scope of [XProc 1.0]."]

5.6 Multiple-file Command-line Document Generation

  1. Read a list of source documents.

  2. For each document in the list:

    1. Read the document.

    2. Perform a series of XSLT transformations.

    3. Serialize each result.

  3. Alternatively, aggregate the resulting documents and serialize a single result.

(source: Erik Bruchez)

Editorial note: Satisfied20120407
Although the processing scenario described above is exemplified in p:for-each, the command-line requirement is considered to be implementation defined. ["How outside values are specified for pipeline parameters on the pipeline initially invoked by the processor is implementation-defined. In other words, the command line options, APIs, or other mechanisms available to specify such parameter values are outside the scope of [XProc 1.0]."]

5.7 Extracting MathML

Extract MathML fragments from an XHTML document and render them as images. Employ an SVG renderer for SVG glyphs embedded in the MathML.

(source: [xml-core-wg])

Editorial note: MathML Unsatisfied20120407
This use case is not satisfied.

5.8 Style an XML Document in a Browser

Style an XML document in a browser with one of several different stylesheets without having multiple copies of the document containing different xml-stylesheet directives.

(source: [xml-core-wg])

Editorial note: Satisfied20120407
This use case is satisfied by XProc 1.0 as exemplified in p:for-each

5.9 Run a Custom Program

Run a program of your own, with some parameters, on an XML file and display the result in a browser.

(source: [xml-core-wg])

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.10 XInclude and Sign

  1. Process an XML document through XInclude.

  2. Transform the result with XSLT using a fixed transformation.

  3. Digitally sign the result with XML Signatures.

(source: Henry Thompson)

Editorial note: XML Signatures Unsatisfied
This use case is not satisfied.

5.11 Make Absolute URLs

  1. Process an XML document through XInclude.

  2. Remove any xml:base attributes anywhere in the resulting document.

  3. Schema validate the document with a fixed schema.

  4. For all elements or attributes whose type is xs:anyURI, resolve the value against the base URI to create an absolute URI. Replace the value in the document with the resulting absolute URI.

This example assumes preservation of infoset ([base URI]) and PSVI ([type definition]) properties from step to step. Also, there is no way to reorder these steps as the schema doesn't accept xml:base attributes but the expansion requires xs:anyURI typed values.

(source: Henry Thompson)

Editorial note: Satisfied20120407
This use case is satisfied by XProc 1.0 as exemplified in 7.2.10 p:xsl-formatter

5.12 A Simple Transformation Service

  1. Extract XML document (XForms instance) from an HTTP request body

  2. Execute XSLT transformation on that document.

  3. Call a persistence service with resulting document

  4. Return the XML document from persistence service (new XForms instance) as the HTTP response body.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.13 Service Request/Response Handling on a Handheld

Allow an application on a handheld device to construct a pipeline, send the pipeline and some data to the server, allow the server to process the pipeline and send the result back.

(source: [xml-core-wg])

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.14 Interact with Web Service (Tide Information)

  1. Parse the incoming XML request.

  2. Construct a URL to a REST-style web service at the NOAA (see website).

  3. Parse the resulting invalid HTML document with by translating and fixing the HTML to make it XHTML (e.g. use TagSoup or tidy).

  4. Extract the tide information from a plain-text table of data from document by applying a regular expression and creating markup from the matches.

  5. Use XQuery to select the high and low tides.

  6. Formulate an XML response from that tide information.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.15 Parse and/or Serialize RSS descriptions

Parse descriptions:

  1. Iterate over the RSS description elements and do the following:

    1. Gather the text children of the 'description' element.

    2. Parse the contents with a simulated document element in the XHTML namespace.

    3. Send the resulting children as the children of the 'description element.

  2. Apply rest of pipeline steps.

Serialize descriptions

  1. Iterate over the RSS description elements and do the following:

    1. Serialize the children elements.

    2. Generate a new child as a text children containing the contents (escaped text).

  2. Apply rest of pipeline steps.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.16 XQuery and XSLT 2.0 Collections

In XQuery and XSLT 2.0 there is the idea of an input and output collection and a pipeline must be able to consume or produce collections of documents both as inputs or outputs of steps as well as whole pipelines.

For example, for input collections:

  1. Accept a collection of documents.

  2. Apply a single XSLT 2.0 transformation that processes the collection and produces another collection.

  3. Serialize the collection to files or URIs.

For example, for output collections:

  1. Accept a single document as input.

  2. Apply an XQuery that produces a sequence of documents (a collection).

  3. Serialize the collection to files or URIs.

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.17 An AJAX Server

  1. Receive XML request with word to complete.

  2. Call a sub-pipeline that retrieves list of completions for that word.

  3. Format resulting document with XSLT.

  4. Serialize response to XML.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.18 Dynamic XQuery

  1. Dynamically create an XQuery query using XSLT, based on input XML document.

  2. Execute the XQuery against a database.

  3. Construct an XHTML result page using XSLT from the result of the query.

  4. Serialize response to HTML.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.19 Read/Write Non-XML File

  1. Read a CSV file and convert it to XML.

  2. Process the document with XSLT.

  3. Convert the result to a CSV format using text serialization.

(source: Erik Bruchez)

Editorial note: TBD20120407
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.20 Update/Insert Document in Database

  1. Receive an XML document to save.

  2. Check the database to see if the document exists.

  3. If the document exists, update the document.

  4. If the document does not exists, add the document.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.21 Content-Dependent Transformations

  1. Receive an XML document to format.

  2. If the document is XHTML, apply a theme via XSLT and serialize as HTML.

  3. If the document is XSL-FO, apply an XSL FO processor to produce PDF.

  4. Otherwise, serialize the document as XML.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.22 Configuration-Dependent Transformations

Mobile example:

  1. Receive an XML document to format.

  2. If the configuration is "desktop browser", apply desktop XSLT and serialize as HTML.

  3. If the configuration is "mobile browser", apply mobile XSLT and serialize as XHTML.

News feed example:

  1. Receive an XML document in Atom format.

  2. If the configuration is "RSS 1.0", apply "Atom to RSS 1.0" XSLT.

  3. If the configuration is "RSS 2.0", apply "Atom to RSS 2.0" XSLT.

  4. Serialize the document as XML.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.23 Response to XML-RPC Request

  1. Receive an XML-RPC request.

  2. Validate the XML-RPC request with a RelaxNG schema.

  3. Dispatch to different sub-pipelines depending on the content of /methodCall/methodName.

  4. Format the sub-pipeline response to XML-RPC format via XSLT.

  5. Validate the XML-RPC response with an W3C XML Schema.

  6. Return the XML-RPC response.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.24 Database Import/Ingestion

Import example:

  1. Read a list of source documents.

  2. For each document in the list:

    1. Validate the document.

    2. Call a sub-pipeline to insert content into a relational or XML database.

Ingestion example:

  1. Receive a directory name.

  2. Produce a list of files in the directory as an XML document.

  3. For each element representing a file:

    1. Create an iTQL query using XSLT.

    2. Query the repository to check if the file has been uploaded.

    3. Upload if necessary.

    4. Inspect the file to check the metadata type.

    5. Transform the document with XSLT.

    6. Make a SOAP call to ingest the document.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.25 Metadata Retrieval

  1. Call a SOAP service with metadata format as a parameter.

  2. Create an iTQL query with XSLT.

  3. Query a repository for the XML document.

  4. Load a list of XSLT transformations from a configuration.

  5. Iteratively execute the XSLT transformations.

  6. Serialize the result to XML.

(source: Erik Bruchez)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.26 Non-XML Document Production

  1. A non-XML document is fed into the process.

  2. That input is converted into a well-formed XML document.

  3. A table of contents is extracted.

  4. Pagination is performed.

  5. Each page is transformed into some output language.

(source: Rui Lopes)

  1. Read a non-XML document.

  2. Transform.

(source: Norm Walsh)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.27 Integrate Computation Components (MathML)

  1. Select a MathML content element.

  2. For that element, apply a computation (e.g. compute the kernel of a matrix).

  3. Replace the input MathML with the output of the computation.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.28 Document Schema Definition Languages (DSDL) - Part 10: Validation Management

This document provides a test scenario that will be used to create validation management scripts using a range of existing techniques, including those used for program compilation, etc.

The steps required to validate our sample document are:

  1. Use ISO 19757-4 Namespace-based Validation Dispatching Language (NVDL) to split out the parts of the document that are encoded using HTML, SVG and MathML from the bulk of the document, whose tags are defined using a user-defined set of markup tags.

  2. Validate the HTML elements and attributes using the HTML 4.0 DTD (W3C XML DTD).

  3. Use a set of Schematron rules stored in check-metadata.xml to ensure that the metadata of the HTML elements defined using Dublin Core semantics conform to the information in the document about the document's title and subtitle, author, encoding type, etc.

  4. Validate the SVG components of the file using the standard W3C schema provided in the SVG 1.2 specification.

  5. Use the Schematron rules defined in SVG-subset.xml to ensure that the SVG file only uses those features of SVG that are valid for the particular SVG viewer available to the system.

  6. Validate the MathML components using the latest version of the MathML schema (defined in RELAX-NG) to ensure that all maths fragments are valid. The schema will make use the datatype definitions in check-maths.xml to validate the contents of specific elements.

  7. Use MathML-SVG.xslt to transform the MathML segments to displayable SVG and replace each MathML fragment with its SVG equivalent.

  8. Use the ISO 19757-8 Document Schema Renaming Language (DSRL) definitions in convert-mynames.xml to convert the tags in the local nameset to the form that can be used to validate the remaining part of the document using docbook.dtd.

  9. Use the IS0 19757-7 Character Repertoire Definition Language (CRDL) rules defined in mycharacter-checks.xml to validate that the correct character sets have been used for text identified as being Greek and Cyrillic.

  10. Convert the Docbook tags to HTML so that they can be displayed in a web browser using the docbook-html.xslt transformation rules.

Each validation script should allow the four streams produced by step 1 to be run in parallel without requiring the other validations to be carried out if there is an error in another stream. This means that steps 2 and 3 should be carried out in parallel to steps 4 and 5, and/or steps 6 and 7 and/or steps 8 and 9. After completion of step 10 the HTML (both streams), and SVG (both streams) should be recombined to produce a single stream that can fed to a web browser. The flow is illustrated in the following diagram:

DSDL use case graphic

(source: Martin Bryan)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.29 Large-Document Subtree Iteration

Running XSLT on a very large document isn't typically practical. In these cases, it is often the case that a particular element, that may be repeated over-and-over again, needs to be transformed. Conceptually, a pipeline could limit the transformation to a subtree by:

  1. Limiting the transform to a subtree of the document identified by an XPath.

  2. For each subtree, cache the subtree and build a whole document with the identified element as the document element and then run a transform to replace that subtree in the original document.

  3. For any non-matches, the document remains the same and "streams" around the transform.

This allows the transform and the tree building to be limited to a small subtree and the rest of the process to stream. As such, an arbitrarily large document can be processed in a bounded amount of memory.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.30 Adding Navigation to an Arbitrarily Large Document

For a particular website, every XHTML document needs to have navigation elements added to the document. The navigation is static text that surrounds the body of the document. This navigation is added by:

  1. Matching the head and body elements using a XPath expression that can be streamed.

  2. Inserting a stub for a transformation for including the style and surrounding navigation of the site.

  3. For each of the stubs, transformations insert the markup using a subtree expansion that allows the rest of the document to stream.

In the end, the pipeline allows arbitrarily large XHTML document to be processed with a near-constant cost.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.31 Fallback to Choice of XSLT Processor

A step in a pipeline produces multiple output documents. In XSLT 2.0, this is a standard feature of all XSLT 2.0 processors. In XSLT 1.0, this is not standard.

A pipeline author wants to write a pipeline that, at compile-time, the implementation chooses XSLT 2.0 when possible and degrades to XSLT 1.0 when XSLT 2.0 is not supported. In the case of XSLT 1.0, the step will use XSLT extensions to support the multiple output documents--which again may fail. Fortunately, the XSLT 1.0 transformation can be written to test for this.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

5.32 No Fallback for XQuery Causes Error

As the final step in a pipeline, XQuery is required to be run. If the XQuery step is not available, the compilation of the pipeline needs to fail. Here the pipeline author has chosen that the pipeline must not run if XQuery is not available.

(source: Alex Milowski)

Editorial note: TBD
This use case is [not] satisfied by XProc 1.0 as exemplified in TBD

A References

A.1 Reference Documents

xproc
(See http://www.w3.org/TR/xproc/.)
xproc-req
(See http://www.w3.org/TR/xproc-requirements/.)
RFC-2119
[RFC 2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. Network Working Group, IETF, Mar 1997. (See http://www.ietf.org/rfc/rfc2119.txt.)
xml-proc-profiles
XML processor profiles. Henry S. Thompson, Norman Walsh, James Fuller. W3C Working Draft 24 January 2012. (See XML processor profiles.)
xml-core-wg
XML Processing Model Requirements. Dmitry Lenkov, Norman Walsh, editors. W3C Working Group Note 05 April 2004 (See http://www.w3.org/TR/proc-model-req/.)

A.2 Core XML Specifications

XML1.0
[XML-1.0] Extensible Markup Language (XML) 1.0 (Fifth Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 26 November 2008. (See http://www.w3.org/TR/REC-xml/.)
Namespaces1.0
[Namespaces-1.0] Namespaces in XML 1.0 (Third Edition). Tim Bray, Dave Hollander, Andrew Layman, et. al., editors. W3C Recommendation 8 December 2009. (See http://www.w3.org/TR/REC-xml-names/.)
XML1.1
[XML-1.1] Extensible Markup Language (XML) 1.1 (Second Edition). Tim Bray, Jean Paoli, C. M. Sperberg-McQueen, et. al. editors. W3C Recommendation 16 August 2006. (See http://www.w3.org/TR/xml11/.)
Namespaces1.1
[Namespaces-1.1] Namespaces in XML 1.1 (Second Edition). Tim Bray, Dave Hollander, Andrew Layman, et. al., editors. W3C Recommendation 16 August 2006. (See http://www.w3.org/TR/xml-names11/.)
XMLBase
[XML Base] XML Base (Second Edition). Jonathan Marsh and Richard Tobin, editors. W3C Recommendation. 28 January 2009. (See http://www.w3.org/TR/xmlbase/.)
xml-id
[xml:id] xml:id Version 1.0. Jonathan Marsh, Daniel Veillard, and Norman Walsh, editors. W3C Recommendation. 9 September 2005. (See http://www.w3.org/TR/xml-id/.)
XInclude
[XInclude] XML Inclusions (XInclude) Version 1.0 (Second Edition). Jonathan Marsh, David Orchard, and Daniel Veillard, editors. W3C Recommendation. 15 November 2006. (See http://www.w3.org/TR/xinclude/.)

A.3 XML Data Model and XML Information Set

XDM-1.0
[XQuery 1.0 and XPath 2.0 Data Model (XDM)] XQuery 1.0 and XPath 2.0 Data Model (XDM). Mary Fernández, Ashok Malhotra, Jonathan Marsh, et. al., editors. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xpath-datamodel/.)
xml-infoset-rec
XML Information Set (Second Edition) John Cowan, Richard Tobin, editors. W3C Working Group Note 04 February 2004 (See http://www.w3.org/TR/xml-infoset/.)

A.4 XPath and XQuery

XPath1.0
[XPath-1.0] XML Path Language (XPath) Version 1.0. James Clark and Steve DeRose, editors. W3C Recommendation. 16 November 1999. (See http://www.w3.org/TR/xpath.)
XPath-2.0
[XPath 2.0] XML Path Language (XPath) 2.0. Anders Berglund, Scott Boag, Don Chamberlin, et. al., editors. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xpath20/.)
XQuery-1.0
[XQuery-1.0] XQuery 1.0: An XML Query Language. Scott Boag, Don Chamberlin, Mary Fernández, et. al., editors. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xquery/.)
XPath-Functions
[XPath-Functions] XQuery 1.0 and XPath 2.0 Functions and Operators. Ashok Malhotra, Jim Melton, and Norman Walsh, editors. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xpath-functions/.)

A.5 Style, Transform, Serialize

xml-stylesheet
Associating Style Sheets with XML documents 1.0 (Second Edition). James Clark, Simon Pieters, Henry S. Thompson. W3C Recommendation 28 October 2010 (See http://www.w3.org/TR/2010/REC-xml-stylesheet-20101028/.)
xsl-1.1
[XSL 1.1] Extensible Stylesheet Language (XSL) Version 1.1. Anders Berglund, editor. W3C Recommendation. 5 December 2006. (See http://www.w3.org/TR/xsl/.)
XSLT-1.0
[XSLT-1.0] XSL Transformations (XSLT) Version 1.0. James Clark, editor. W3C Recommendation. 16 November 1999. (See http://www.w3.org/TR/xslt.)
XSLT-2.0
[XSLT 2.0] XSL Transformations (XSLT) Version 2.0. Michael Kay, editor. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xslt20/.)
Serialization
[Serialization] XSLT 2.0 and XQuery 1.0 Serialization. Scott Boag, Michael Kay, Joanne Tong, Norman Walsh, and Henry Zongaro, editors. W3C Recommendation. 23 January 2007. (See http://www.w3.org/TR/xslt-xquery-serialization/.)

A.6 XML Schema Languages

XMLSchema1
[W3C XML Schema: Part 1] XML Schema Part 1: Structures Second Edition. Henry S. Thompson, David Beech, Murray Maloney, et. al., editors. World Wide Web Consortium, 28 October 2004. (See http://www.w3.org/TR/xmlschema-1/.)
XMLSchema2
[W3C XML Schema: Part 2] XML Schema Part 2: Datatypes Second Edition. Paul V. Biron and Ashok Malhotra, editors. World Wide Web Consortium, 28 October 2004. (See http://www.w3.org/TR/xmlschema-2/.)
RELAX-NG
[RELAX NG] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2008(E) Document Schema Definition Language (DSDL) -- Part 2: Regular-grammar-based validation -- RELAX NG 2008. (See http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=52348.)
RELAX-NG-Compact-Syntax
[RELAX NG Compact Syntax] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-2:2003/Amd 1:2006 Document Schema Definition Languages (DSDL) — Part 2: Grammar-based validation — RELAX NG AMENDMENT 1 Compact Syntax 2006.
RELAX-NG-DTD-Compatibility
[RELAX NG DTD Compatibility] RELAX NG DTD Compatibility. OASIS Committee Specification. 3 December 2001.
Schematron
[Schematron] ISO/IEC JTC 1/SC 34. ISO/IEC 19757-3:2006(E) Document Schema Definition Languages (DSDL) — Part 3: Rule-based validation — Schematron 2006.

A.7 Identifiers and Names

Editorial note: Identifiers 
Identifier: "In metadata, an identifier is a language-independent label, sign or token that uniquely identifies an object within an identification scheme. The suffix identifier is also used as a representation term when naming a data element. [...] In computer science, identifiers (IDs) are lexical tokens that name entities. The concept is analogous to that of a "name." Identifiers are used extensively in virtually all information processing systems. Naming entities makes it possible to refer to them, which is essential for any kind of symbolic processing. [...] In computer languages, identifiers are tokens (also called symbols) which name language entities. Some of the kinds of entities an identifier might denote include variables, types, labels, subroutines, and packages. In most languages, some character sequences have the lexical form of an identifier but are known as keywords." -- WikiPedia 10 Apr 2012
RFC-2396
[RFC 2396] Uniform Resource Identifiers (URI): Generic Syntax. T. Berners-Lee, R. Fielding, and L. Masinter. Network Working Group, Internet Engineering Task Force. , Aug 1998. (See http://www.ietf.org/rfc/rfc2396.txt.)
RFC-3986
[RFC 3986] RFC 3986: Uniform Resource Identifier (URI): General Syntax. T. Berners-Lee, R. Fielding, and L. Masinter, editors. Internet Engineering Task Force. January, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)
RFC-3987
[RFC 3987] RFC 3987: Internationalized Resource Identifiers (IRIs). M. Duerst and M. Suignard, editors. Internet Engineering Task Force. January, 2005. (See http://www.ietf.org/rfc/rfc3987.txt.)
LEIRI
Legacy extended IRIs for XML resource identification. Henry S. Thompson, Richard Tobin, Norman Walsh. W3C Working Group Note 3 November 2008 (See http://www.w3.org/TR/2008/NOTE-leiri-20081103/.)
URN-syntax
URN Syntax. R. Moats. IETF Request for Comments: 2141. PROPOSED STANDARD. Internet Engineering Task Force. May 1997. (See http://tools.ietf.org/html/rfc2141.)
URN-function
Functional Requirements for Uniform Resource Names. Informational Request for Comments: 1737. K. Sollins, L. Masinter. Internet Engineering Task Force. December 1994 (See http://tools.ietf.org/html/rfc1737.)
RFC-3187
Using International Standard Book Numbers as Uniform Resource Names. J. Hakala, H. Walravens. Informational Request for Comments: 3187. Internet Engineering Task Force. Oct 2001. (See http://tools.ietf.org/html/rfc3187.)
RFC-4122
RFC 4122] RFC 4122: A Universally Unique IDentifier (UUID) URN Namespace. P. Leach and M. Mealling, editors. Internet Engineering Task Force. July, 2005. (See http://www.ietf.org/rfc/rfc4122.txt.)
UUID
[UUID] ITU X.667: Information technology - Open Systems Interconnection - Procedures for the operation of OSI Registration Authorities: Generation and registration of Universally Unique Identifiers (UUIDs) and their use as ASN.1 Object Identifier components. 2004. (See http://www.itu.int/ITU-T/studygroups/com17/oid.html.)

A.8 HTTP Request & Authentication

RFC-2616
[RFC 2616] RFC 2616: Hypertext Transfer Protocol — HTTP/1.1. R. Fielding, J. Gettys, J. Mogul, et. al., editors. Internet Engineering Task Force. June, 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
RFC-2617
[RFC 2617] RFC 2617: HTTP Authentication: Basic and Digest Access Authentication. J. Franks, P. Hallam-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, L. Stewart. June, 1999 . (See http://www.ietf.org/rfc/rfc2617.txt.)

A.9 Character Encodings

RFC-3548
[RFC 3548] RFC 3548: The Base16, Base32, and Base64 Data Encodings. S. Josefsson, Editor. Internet Engineering Task Force. July, 2003. (See http://www.ietf.org/rfc/rfc3548.txt.)
Unicode-TR17
[Unicode TR#17] Unicode Technical Report #17: Character Encoding Model. Ken Whistler, Mark Davis, and Asmus Freytag, authors. The Unicode Consortium. 11 November 2008. (See http://unicode.org/reports/tr17/.)

A.10 Media Types

IANA-MIME-Types
[IANA MIME Media Types] IANA MIME Media Types. Internet Engineering Task Force. (See http://www.iana.org/assignments/media-types/.)
RFC-1521
[RFC 1521] RFC 1521: MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies. N. Borenstein, N. Freed, editors. Internet Engineering Task Force. September, 1993. (See http://www.ietf.org/rfc/rfc1521.txt.)
RFC-3023
[RFC 3023] RFC 3023: XML Media Types. M. Murata, S. St. Laurent, and D. Kohn, editors. Internet Engineering Task Force. January, 2001. (See http://www.ietf.org/rfc/rfc3023.txt.)

A.11 Digital Signatures

The following are listed in XProc 1.0. Shoud the list broaden? Are there candidates? Could one plug in their own encoding? How would a recipient know how to decode it? Just wondering.

CRC32
[CRC32] “32-Bit Cyclic Redundancy Codes for Internet Applications”, The International Conference on Dependable Systems and Networks: 459. 10.1109/DSN.2002.1028931. P. Koopman. June 2002.
MD5
[MD5] RFC 1321: The MD5 Message-Digest Algorithm. R. Rivest. Network Working Group, IETF, April 1992. (See http://www.ietf.org/rfc/rfc1321.txt.)
SHA1
[SHA1] Federal Information Processing Standards Publication 180-1: Secure Hash Standard. 1995. (See http://www.itl.nist.gov/fipspubs/fip180-1.htm.)

A.12 Candidate Specifications: XPointers

The following are listed but not referenced in XProc 1.0.

xptr-framework
[xptr-framework] XPointer Framework. Paul Grosso, Eve Maler, Jonathan Marsh, et. al., editors. W3C Recommendation. 25 March 2003. (See http://www.w3.org/TR/xptr-framework/.)
xptr-element
[xptr-element() Scheme] XPointer element() Scheme. Paul Grosso, Eve Maler, Jonathan Marsh, et. al., editors. W3C Recommendation. 25 March 2003. (See http://www.w3.org/TR/xptr-element/.)

A.13 Candidate Specification: XLink

The following are not listed in XProc 1.0.

XLink
[XLink] XML Linking Language (XLink) Version 1.1 Steve DeRose, Eve Maler, David Orchard, Norman Walsh. W3C Recomendation 06 May 2010. (See http://www.w3.org/TR/2010/REC-xlink11-20100506/.)

A.14 Candidate Specification: Mathematics

The following are not listed in XProc 1.0.

MathML
Mathematical Markup Language (MathML) Version 3.0. Ron Ausbrooks, Stephen Buswell, David Carlisle, Giorgi Chavchanidze, Stéphane Dalmas, Stan Devitt, Angel Diaz, Sam Dooley, Roger Hunter, Patrick Ion, Michael Kohlhase, Azzeddine Lazrek, Paul Libbrecht, Bruce Miller, Robert Miner, Chris Rowley, Murray Sargent, Bruce Smith, Neil Soiffer, Robert Sutor, Stephen Watt. W3C Recommendation 21 October 2010. (See http://www.w3.org/TR/MathML3/.)

A.15 Candidate Specification: XForms

The following are not listed in XProc 1.0.

XForms-1.1
XForms 1.1. John M. Boyer. W3C Recommendation 20 October 2009 (See http://www.w3.org/TR/2009/REC-xforms-20091020/.)

A.16 Candidate Specification: EXI

The following are not listed in XProc 1.0.

The following are other XML-related specifications for which some form of processing support.

EXI
Efficient XML Interchange (EXI) Format 1.0. John Schneider, Takuki Kamiya. W3C Recommendation 10 March 2011. (See http://www.w3.org/TR/2011/REC-exi-20110310/.)

A.17 Candidate Specifications: HTML-ish

The following are not listed in XProc 1.0.

HTML5
A vocabulary and associated APIs for HTML and XHTML. Ian Hickson. W3C Working Draft 29 March 2012. (See http://www.w3.org/TR/html5/.)
ISO-HTML
Information technology — Document description and processing languages — HyperText Markup Language (HTML) ISO/IEC 15445:2000(E) First edition 2000-05-15. (See http://www.scss.tcd.ie/misc/15445/15445.HTML.)
XHTML-Basic
XHTML™ Basic 1.1. Shane McCarron, Masayasu Ishikawa, Mark Baker, Masayasu Ishikawa, Shinichi Matsui, Peter Stark, Ted Wugofski, Toshihiko Yamakami. W3C Recommendation 29 July 2008 (See http://www.w3.org/TR/2008/REC-xhtml-basic-20080729/.)
XHTML-Modularization
XHTML™ Modularization 1.1 - Second Edition W3C Recommendation 29 July 2010 . (See http://www.scss.tcd.ie/misc/15445/15445.HTML.)

A.18 Candidate Specifications: Popular Publishing Profiles

The following are not listed in XProc 1.0.

DITA
Darwin Information Typing Architecture (DITA) Version 1.2. Kristen James Eberlein, Robert D. Anderson, Gershon Joseph. OASIS Standard 1 December 2010 (See http://docs.oasis-open.org/dita/v1.2/os/spec/DITA1.2-spec.html.)
DocBook
The DocBook Document Type. Norman Walsh. OASIS Standard V4.5, 01 October 2006 (See http://docbook.org/specs/docbook-4.5-spec.pdf.)
OpenDocument
Open Document Format for Office Applications (OpenDocument) v1.0. Michael Brauer, Patrick Durusau, Gary Edwards, David Faure, Tom Magliery, Daniel Vogelheim. OASIS Standard, 1 May 2005. (See http://www.oasis-open.org/committees/download.php/12572/OpenDocument-v1.0-os.pdf.)
OOXML
Office Open XML File Formats. Standard ECMA-376. 1st edition (December 2006), 2nd edition (December 2008) and 3rd edition (June 2011) (See http://www.ecma-international.org/publications/standards/Ecma-376.htm.)
TEI
P5: Guidelines for Electronic Text Encoding and Interchange. Lou Burnard and Syd Bauman. Text Encoding Iniative Release. 2007. (See http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index.html.)

A.19 B2B Transaction Language Profile

The following are not listed in XProc 1.0.

One example of a B2B Transaction Language for which a nominal XProc profile could be developed.

UBL
Universal Business Language v2.0 . Jon Bosak, Tim McGrath, Ken Holman. OASIS Standard, 12 December 2006 (See http://docs.oasis-open.org/ubl/os-UBL-2.0/UBL-2.0.html.)

A.20 Candidate Specifications: Digital Signatures and Encryption

The following are not listed in XProc 1.0.

Canonocal-xml
Canonical XML Version 2.0. John Boyer, Glenn Marcy, Pratik Datta, Frederick Hirsch. W3C Editor's Draft 16 December 2011 (See http://www.w3.org/2008/xmlsec/Drafts/c14n-20/.)
XML-Sign-2.0
XML Signature Syntax and Processing Version 2.0. Editors: Donald Eastlake, Joseph Reagle, David Solo, Frederick Hirsch, Thomas Roessler, Kelvin Yiu, Pratik Datta, Scott Cantor, Authors: Mark Bartel, John Boyer, Barb Fox, Brian LaMacchia, Ed Simon. W3C Editor's Draft 18 January 2012 (See http://www.w3.org/2008/xmlsec/Drafts/xmldsig-core-20/.)
XML-Sign-XPath
XML Signature Streaming Profile of XPath 1.0. Pratik Datta Frederick Hirsch. Meiko JensenW3C Editor's Draft 13 December 2011. (See http://www.w3.org/2008/xmlsec/Drafts/xmldsig-xpath/.)
XML-Sign-Transforms
XML Encryption 1.1 CipherReference Processing using 2.0 Transforms. Frederick Hirsch. W3C Editor's Draft 05 January 2012 (See http://www.w3.org/2008/xmlsec/Drafts/xmlenc-transform20/.)

A.21 Candidate Specifications: Semantic Web

The following are not listed in XProc 1.0.

The following are Semantic Web-related specifications for which some form of processing support.

rdf-syntax
RDF/XML Syntax Specification (Revised) . Dave Beckett, Brian McBride. W3C Recommendation 10 February 2004. (See http://www.w3.org/TR/rdf-syntax-grammar/.)
rdf-schema
RDF Vocabulary Description Language 1.0: RDF Schema . Dan Brickley, R.V. Guha, Brian McBride. W3C Recommendation 10 February 2004. (See http://www.w3.org/TR/rdf-schema/.)
GRDDL
Gleaning Resource Descriptions from Dialects of Languages (GRDDL). Dan Connolly W3C Recommendation 11 September 2007. (See http://www.w3.org/TR/grddl/.)
RDFa in XHTML: Syntax and Processing. A collection of attributes and processing rules for extending XHTML to support RDF. Ben Adida, Mark Birbeck, Shane McCarron, Steven Pemberton. W3C Recommendation 14 October 2008 (See http://www.w3.org/TR/rdfa-syntax/.)
rif-in-rdf
RIF In RDF. Sandro Hawke, W3C/MIT Axel Polleres. W3C Working Group Note 12 May 2011 (See http://www.w3.org/TR/rif-in-rdf/.)
rif-fld
RIF Framework for Logic Dialects . Harold Boley, Michael Kifer. W3C Recommendation 22 June 2010 (See http://www.w3.org/TR/rif-fld/.)
rif-bld
RIF Basic Logic Dialect. Harold Boley, Michael Kifer. W3C Recommendation 22 June 2010. (See http://www.w3.org/TR/rif-bld/.)
skos-reference
SKOS Simple Knowledge Organization System Reference. Alistair Miles, Sean Bechhofer. W3C Recommendation 18 August 2009 (See http://www.w3.org/TR/skos-reference/.)
rdf-sparql-query
SPARQL Query Language for RDF. Eric Prud'hommeaux, Andy Seaborne. W3C Recommendation 15 January 2008 (See http://www.w3.org/TR/rdf-sparql-query/.)

A.22 Candidate Non-XML Data Format Specifications

The following are not listed in XProc 1.0.

CSV
Common Format and MIME Type for Comma-Separated Values (CSV) Files. Y. Shafranovich. Internet Engineering Task Force. Request for Comments. October 2005. (See http://tools.ietf.org/rfc/rfc4180.txt.)
JSON
The application/json Media Type for JavaScript Object Notation (JSON). D. Crockford. Internet Engineering Task Force. July 2006 (See http://www.ietf.org/rfc/rfc4627.txt.)
SGML
Standard Generalized Markup Language (ISO 8879:1986 SGML). Charles Goldfarb, et al. 1986
TeX
The TeX Book (Computers and Typesetting, Volume A). Donald Knuth. Reading, Massachusetts: Addison-Wesley, 1984. ISBN 0-201-13448-9.
TROFF
Text Formatting: Technical Reference. Murray Maloney. SoftQuad Press. 1987. ISBN 0-88910-326-7.

A.23 Reference Processors?

The following are listed in XProc 1.0 but not normatively.

A list of reference processors? Optional implementation through a published signature?

HTML-Tidy
[HTML Tidy] HTML Tidy Library Project. SourceForge project. (See http://tidy.sourceforge.net/.)
TagSoup
[TagSoup] TagSoup - Just Keep On Truckin'. John Cowan. (See http://ccil.org/~cowan/XML/tagsoup/.)

B Unsatisfied V1 CR Issues

The following are taken from the XProc Candidate Issues Document as determined at the working group's October 31 f2f (minutes). Issue numbers refer to numbers given in the issues document. The editors intend to expand these notes and migrate them to later sections as and when appropriate.

C Unsatisfied V1 Requirements and Use Cases

Sections 2-5 of the V1 XML Processing Model Requirements and Use Cases are included herein, annotated for review of requirements and use cases that have been left unsatisfied in V1. The editors hope to record which requirements and use cases have been satisfied by XProc: An XML Pipeline Language, and to note which have not been satisfied. This should assist the working group in determining which requirements and use cases should be addressed in XProc V.next.

To aid navigation, the requirements can be mapped to the use cases of this section as follows:

RequirementUse CaseSatisfied By
4.1 Standard Names for Component Inventory, TBD
4.2 Allow Defining New Components and Steps, TBD
4.3 Minimal Component Support for Interoperability TBD
4.4 Allow Pipeline Composition TBD
4.5 Iteration of Documents and Elements TBD
4.6 Conditional Processing of Inputs, TBD
4.7 Error Handling and Fall-back TBD
4.8 Support for the XPath 2.0 Data Model TBD
4.9 Allow Optimization TBD
4.10 Streaming XML Pipelines TBD
Unspecified TBD
Editorial note 
The above table is known to be incomplete and will be completed in a later draft. We note that many Use Cases are not associated with a Requirement, and welcome suggestions as to the correspondence to Requirements for Use Cases 5.4-8, 5.12-14, 5.17-21, 5.22, 5.25-26 and 5.28.

D Collected Input

Ideas that have been collected....

  • * Support for 'depends-on' (or some mechanism for asserting dependencies that are not manifest in the data flow)

  • * Simplify the task of passing "optional options" through a pipeline?

  • * Explore using maps to simplify the parameters story

  • * A way to merge the context defined by elements p:xpath-context, p:viewport-source, p:iteration-source ?

  • * Assist making it easier to create cross platform pipelines e.g. file.separator in file paths

D.1 Architecture

D.1.1 What Flows?

Should we open up the pipeline architecture to allow more than XML documents to flow through it? With respect to other media types (see below for some possibilities), there are a number of possibilities in general:

  1. Allow staticly, only at the whole-pipeline margins

  2. Allow staticly, at the step level (i.e. step signatures include media types for all inputs and outputs)

    1. Reject any pipeline where the output media type doesn't match the media type of the input to which its connected

      1. and any non-XML output must immediately be converted to XML

      2. and foo--foo connections are allowed

    2. And auto-shim for every possible pair

    3. And auto-shim only for other-XML and XML-other, so other1→other2 requires two shims

  3. Allow dynamically (e.g. from p:http-request)

    1. With a static declaration of the alternatives you expect, and anything else is an error

    2. With a pipeline fallback if all else fails, getting <c:data media-type=...>...</c:data>

Any shim-to-XML can be (?) configured wrt the target vocabulary (how?) We could identify shim tactics with QNames, similar to the way serialization methods are done in XProc already

Allow non-XML (text/binary) to flow through a pipeline. The implementation would hex-encode non-XML whenever XML was expected This would, for example, allow xsl-formatter to produce the output on a port that could then be serialized by the pipeline.

D.1.1.1 Sequences

Allow the same thing XQuery/XSLT allow as values for variables.

D.1.1.2 Sets of Documents

Allow unbounded number of outputs from some steps? MZ says we need this for the NVDL use case [cross-reference needed]. Markup pipeline allowed this, subsequent steps need to access by name, where default naming is with the integers. . . p:pack could have more than two inputs, so you could do column-major packing . . .

D.1.1.3 MetaData, HTML5, JSON, Plain Text

See the list of Non-XML Data Formats listed in A.

D.1.2 Events

Support a more event-driven processing model?

Can we suspend a pipeline waiting for something to happen? Some examples; wait for HTTP POST from github (notifications), jms queue listener, tcp socket listener

Can we dump a partially evaluated pipeline instance for subsequent resumption?

D.1.3 Synchronization & Concurrency

Related-but-different, with pipeline-internal events, as it were Philip Fennel has done some work on XProc+SMIL. Does this relate to XML Choreography?

D.2 Resource Management

D.2.1 Add a Resource Manager

  • Local store and retrieve. Build it, store it, get it back later, all under your control

  • On-demand construction. Associate a pipeline with a URI into the manager, which will run if the URI is not there. Or not current -- you need to know what all the dependencies are, and check them

  • Give URIs to step outputs. So you could point xinclude at a step output. Would you have to include a local catalog facility to make this really useful?

  • Cache intermediate URIs

D.2.2 Dynamic pipeline execution

Run a pipeline whose XML representation is input

  • Dynamic evaluation

  • Dynamic attribute values:

  • Attribute value templates

  • Steps with varying numbers of inputs/outputs with dynamic names

D.2.3 Information caches

Should we give access to MemCache and elasticache?

Already possible from an extension step [reference needed], do we need more?

Already possible using p:http-request?

D.2.4 Environment

Should we have a way of accessing environment information more generally?

D.2.5 Datatypes

No information available.

D.3 Integration

D.3.1 XML Choreography

The orchestration of XSLT/XQuery/.... XProc as the controller. Support for playing a useful standardised role in XRX. LQ.

D.3.2 Authentication

Can we add some kind of authentication management which is out-of-band but available?

Does this need to be in the language, or can it be implementation-defined? If it was in the language how would steps get at it?

D.3.3 Debugging

How to make xproc development more amenable to debugging ?

D.3.4 Fall-back Mechanism

How to make xproc development more amenable to error recovery?

D.3.5 Clustering

Do we need support for clustering?

D.4 Usability

D.4.1 Verbosity

Can we simplify the markup? Is there a compact sytntax alternative?

Some suggestions for syntax simplification

D.4.1.1 p:data

No information available.

<p:data href=... />
D.4.1.2 p:input
<p:input port="source" href="…"/>
<p:input port="source" step="name" step-port="secondary"/>
 <p:input port="source" step="name"/>

<p:input select= .... />  
D.4.1.3 p:option

No information available.

<p:option ... />
D.4.1.4 p:pipe
 <p:pipe step="name"/>  
D.4.1.5 p:serialization
 <p:serialization ... />  
D.4.1.6 p:store

An option on p:store to save decoded/binary data.

<p:store ... />
D.4.1.7 p:string-replace

Improve p:string-replace quoting ugliness

 <p:string-replace ... />  
D.4.1.8 p:template

Empty source on p:template. If you're fabricating from whole cloth, you have to waste space with a pointless <foo/> What would be the downside of having the empty sequence as the default input in most/all cases? AM suggests that we allow this on a step-by-step basis

 <p:template ... />  
D.4.1.9 p:try

p:group within p:try -- Could we remove this requirement? Is this a case of making life easier for implementors which confuses users? Or is it actually simpler to have the group/catch as the only top-level children?

D.4.1.10 p:variable

Can we remove the restriction on variables/options/params being bound only to strings? What would be allowed:

  • binaries - This would allow not only the possibility of binary resource files, but all would enable the ability to pass maps, which is where I think the real value-add comes in.

  • sequences - Not just for strings, but for nodes and binaries as well.

Related...

  • p:variable templates

  • Allow variables to be visible in nested pipelines

  • Should we allow p:variable anywhere in groups?

  • Adding a p:variable requires adding p:group…feels odd

Explanation: Allow p:xpath-context/@select? Library-level (“global”) variables? And/or pipeline-level variables that would be visible also in nested pipelines? Not really a variable, but a p:option or p:parameter that’s visible across multiple pipelines.

Example: A directory path shared by several steps that the pipeline user might want to override. A simple mechanism for constructing XML fragments using local context. (A single template? XQuery style curly braces?)

Here’s a constructive example... Make p:rename/@new-name optional, so that it’s possible to move elements from namespace X that match a certain condition to namespace Y. This is currently quite difficult to do. Could you achieve this using @use-when?

D.4.1.11 p:viewport
  • Long form viewport With an intrinsic switch built-in

  • p:viewport/@match

    p:viewport-source

D.4.2 Parameter Rules

Now that we have a bunch of real pipelines, can we simplify the rules by limiting the allowed usage patterns? At least, get rid of the necessity for p:empty as the parameter input [when it's now required: someone to fill in]

  • Data types for options and parameters

  • Arbitrary data model fragments for parameters/options/variables

Here's the hard case that has to be handled:

<p:pipeline> 
    <p:xslt> 
        <p:input port="stylesheet"> 
            <p:document href="docbook.xsl"/> 
        </p:input> 
    </p:xslt> 
</p:pipeline> 

Pass parameters to the pipeline and have those parameters available inside the stylesheet without enumerating all of them in the pipeline. How do I easily create a c:param-set for a hypothetical 'parameters' option without invoking even more magic than we currently have?

D.4.3 Choose-style binding

Suppose you have a pipeline with a step X, and depending on some dynamic condition, you want X to process documents (or entire sequences of documents) A, B, or C. Currently, the only way to do this is to use a p:choose a to duplicate the step X with different input bindings in each branch. This not only looks silly, but it is painful to write.

One solution to this would be a choose-style binding (a wrapper around the existing bindings) that would dynamically select the bindings to use.

D.4.4 Output signatures for compound steps

The existing magic is not consistent or easily understandable

D.4.5 Loading computed URIs

Lots of workarounds, but shouldn't need them. Attribute-value templates would solve this.

D.4.6 Optional options for declared steps

AM to complete. Something that works from the command line but not internally to a library step???

D.4.7 XPath

  • XPath 2.0 only?

  • Custom XPath functions (ala xsl:function) using “simplified XProc steps” (whatever that means)

    A way of re-using pipelines. Or allowing pipelines to be imported into XQuery or XSLT

D.4.8 Simplify Use of File Sets

Some mechanism for loading sets of documents. XProc, as currently defined, feels somewhat awkward:

  • consider a p:documents element which roughly emulates apache ant filesets

  • consider reusable file path structures

  • consider providing conventions for making xproc scripts more cross platform e.g. file seperators

 <p:document href="/path/to/directory" include="*.xml"/>
<p:data href="/path/to/directory" include="*.xml"/>

D.4.9 Required Primary Port

Editorial note: Candidate Use Case20120405
Required Primary Port

(source: Alex Milowski)

I find myself always frustrated when I have to use steps that have no primary output port defined. I usually have to do some sort of "fixup" in the pipeline just to make what I believe should be the minimum. I'm often using p:store or ml:insert-document (marklogic) and, while there is an output, it just isn't defined as primary. While you can say that is just a bad step definition, I think it is more than that.

I think it would have been better to say that if your step produces any output, one of the ports must be defined as primary. This would also avoid pipeline re-arrangements after edits due to unconnected output ports.

For example, consider these two snippets, which are not interchangeable in that the first has a single non-primary output and the second has a single primary output.

<p:store .../> <p:viewport match="/doc/section"> <p:store href="..."/>
          </p:viewport>

My contention is that by requiring when you have output you have one port designated as primary, a pipeline will be able to be manipulated with less additional surgery. In my case recently, it was the fact that I had following step structure:

<p:store .../> <p:xslt> <p:input port="source"><p:pipe step="somewhere"
          port="result"/> </p:xslt>

I then wrapped it with a viewport:

<p:viewport> <p:store .../> </p:viewport> <p:xslt> <p:input
          port="source"><p:pipe step="somewhere" port="result"/> </p:xslt>

and got errors as the primary output port isn't connected. I had to do this to fix it:

<p:viewport> <p:store .../> </p:viewport> <p:sink/> <p:xslt> <p:input
          port="source"><p:pipe step="somewhere" port="result"/> </p:xslt> 

With my proposal, I would have originally been required to write:

<p:store../> 
<p:sink/> 
<p:xslt> <p:input port="source">
    <p:pipe step="somewhere" port="result"/> 
</p:xslt>

D.4.10 Documentation Conventions

Add a Note or another spec for documentation conventions. Parallel to Javadoc? add an xml:lang attribute to p:documentaton and recommend its use. See https://community.emc.com/docs/DOC-8657 for an example

D.5 New Steps

D.5.1 Various Suggestions

  • p:apply -- Run a static, known step, whose type is computed dynamically.

  • p:documents

  • p:evaluate -- Compile a pipeline and run it.

  • p:log

  • p:nvdl

  • p:sax-filter

  • p:sort

D.5.2 Iterate until condition

Repeat a step/group? until some XPath expression is satisfied, feeding its output back as its input after the first go-around

  • Special built-in support for iterate to fixed-point?

  • Compund step like xsl:iterate (XSLT 3.0)

  • Iterate-to-fixed-point already implemented as an extension step in Calabash: ex:until-unchanged

  • p:iteration-source

D.5.3 p:send-mail

A step(s) to handle SMTP and sending e-mail messages.

Editorial note: SMTP 
What about receiving messages? What about other messaging protocols? SMS?

E FYI: Categorized Steps

Here is my first cut of the step inventory categorization for my action item. I've take this from information that was sent to me, source code, and documentation online [1]. I did not include the general categories we had on the wiki [2]. Those categories were "Sorting", "Validation with Error", "Map-reduce", "Iterate until condition", "Dynamic Pipeline Execution", "Long-form Viewport", and "e-mail." -- AM.

These lists will be annotatted and re-formatted later. -- MM.

E.1 Micro-operations

7.1.1 p:add-attribute
7.1.2 p:add-xml-base
7.1.5 p:delete
7.1.12 p:insert
7.1.13 p:label-elements
7.1.15 p:make-absolute-uris
7.1.16 p:namespace-rename
7.1.19 p:rename
7.1.20 p:replace
7.1.21 p:set-attributes
7.1.25 p:string-replace
7.1.27 p:unwrap
7.1.28 p:wrap

 - - -   cx:namespace-delete (calabash)

E.2 Transformation

7.1.30 p:xinclude
7.1.31 p:xslt

 - - -   p:template (note)

E.3 Query

7.2.9 p:xquery

 - - -   ml:adhoc-query (calabash)
 - - -   ml:insert-document (calabash)
 - - -   ml:invoke-module (calabash)

E.4 Validation

7.2.4 p:validate-with-relax-ng
7.2.5 p:validate-with-schematron
7.2.6 p:validate-with-xml-schema

 - - -  cx:nvdl (calabash)

E.5 Document Operations

7.1.3  p:compare
7.1.4  p:count
7.1.11 p:identity
7.1.9  p:filter
7.2.2  p:hash
7.2.10 p:xsl-formatter

 - - -   cx:delta-xml (calabash)
 - - -   cx:pretty-print (calabash)
 - - -   cx:css-formatter (calabash)
 - - -   cxu:compare (calabash)
 - - -   emx:get-base-uri (emc)

E.6 File & Directory Operations

7.1.6 p:directory-list

 - - -   cx:zip (calabash)
 - - -   cx:unzip (calabash)
 - - -   cxf:info (calabash)
 - - -   cxf:delete (calabash)
 - - -   cxf:mkdir (calabash)
 - - -   cxf:copy (calabash)
 - - -   cxf:move (calabash)
 - - -   cxf:touch (calabash)
 - - -   cxf:tempfile (calabash)
 - - -   cxf:head (calabash)
 - - -   cxf:tail (calabash)

E.7 Image Operations

 - - -   cx:metadata-extractor (calabash)

E.8 Error / Message Handling

7.1.7 p:error

 - - -   cx:message (calabash)
 - - -   emx:message (emc)

E.9 Sequence Operations

7.1.17 p:pack
7.1.23 p:split-sequence
7.1.29 p:wrap-sequence

E.10 Input / Output

7.1.10 p:http-request
7.1.14 p:load
7.1.22 p:sink
7.1.24 p:store

 - - -   cx:uri-info (calabash)
 - - -   emx:fetch (emc)

E.11 XProc Operations

7.1.18 p:parameters
 - - -   p:in-scope-names (note)

 - - -   cx:eval (calabash)
 - - -   cx:report-errors (calabash)
 - - -   emx:eval (emc)

E.12 Encoding

7.1.8 p:escape-markup
7.1.26 p:unescape-markup
7.2.7 p:www-form-urldecode
7.2.8 p:www-form-urlencode

E.13 Execution Control

7.2.1 p:exec

E.14 Resource / Collection Management

 - - -   cx:collection-manager (calabash)

E.15 Environment

 - - -   cx:java-properties (calabash)
 - - -   cxo:info (calabash)
 - - -   cxo:cwd (calabash)
 - - -   cxo:env (calabash)

E.16 Miscellaneous

7.2.3 p:uuid

 - - -   cx:get-cookies (calabash)
 - - -   cx:set-cookies (calabash)
 - - -   cx:send-mail (calabash)

F Contributors

Members of the Working Group contributed to this specification as noted throughout.

  • Erik Bruchez

  • Alex Milowski

  • Henry Thompson

  • Norm Walsh

  • Mohamed Zergaoui