XProc: An XML Pipeline Language (Combined test)

W3C Working Draft 20 December 2006

This Version:: http://www.w3.org/XML/XProc/docs/WD-xproc-20061219/
Latest Version:: http://www.w3.org/XML/XProc/docs/langspec.html
Previous versions:: http://www.w3.org/TR/2006/WD-xproc-20061117/ http://www.w3.org/TR/2006/WD-xproc-20060928/
Editors:: Norman Walsh, Sun Microsystems, Inc. <Norman.Walsh@Sun.COM>; Alex Milowski, Invited expert <alex@milowski.org>

This document is also available in these non-normative formats: XML, Revision markup

Abstract

This specification describes the syntax and semantics of XProc: An XML Pipeline Language, a language for describing operations to be performed on XML documents.

An XML Pipeline specifies a sequence of operations to be performed on one or more XML documents, producing one or more XML documents as output. Steps in the pipeline may read or write non-XML resources as well.

…

3.2 For-Each

A for-each construct processes a sequence of documents, applying its subpipeline to each document in turn.

Inputs: source, a sequence of documents.
Outputs: As declared.
Parameters: As declared.
Contained components: As declared.

The context of a for-each is its inherited context modified as follows:

All of the declared inputs of the for-each are added to the outputs in the context.
The union of all the declared outputs of the contained components are added to the outputs in the context.
All of the declared parameters of the for-each are added to the parameters in the context.

This is the context used by the for-each and inherted by its contained components.

The for-each construct can be used in cases where a component requires a single document input but a pipeline needs to process a sequence of documents with that component.

The result of the for-each is a sequence of documents produced by processing each individual document in the input sequence. If the subpipeline is connected to one or more output ports on the for-each, what appears on each of those ports is the sequence of documents produced by each iteration of the loop.

For example, a for-each might accept a sequence of DocBook chapters as its input, process each chapter in turn with XSLT, and produce a sequence of formatted chapters as its output.

The p:for-each element represents a for-each.

<p:for-each name = QName select? = xpath expression> (p:input, p:output*, p:parameter*, subpipeline) </p:for-each>

The p:for-each component has exactly one input named “source” for which a binding must be specified. If outputs are declared, they must also include a binding.

The source input behaves exactly like an input on any other component: it provides a sequence of documents to the for-each construct. A portion of each input document can be selected using the select attribute on p:for-each. If no additional selection is specified, the document node of each input document is selected.

Each group of nodes selected by the p:for-each from each of the inputs that appear on source is wrapped in a document node and provided to the subpipeline.

The processor provides each document to the subpipeline represented by the children of the p:for-each, one at a time, on a port named current.

For each declared output, the processor collects all the documents that are produced for that output from all the iterations, in order, into a sequence. The result of the p:for-each is that set of document sequences.

Example 1, “A Sample For-Each” shows an example of a p:for-each in action.

Example 1. A Sample For-Each

<p:for-each name="chapters" select="//chapter">
  <p:input port="source">
    <p:document href="http://example.org/docbook.xml"/>
  </p:input>
  <p:output port="html">
    <p:pipe step="xform-to-html port="result"/>
  </p:output>
  <p:output port="fo">
    <p:pipe step="xform-to-fo" port="result"/>
  </p:output>
  <p:step name="xform-to-fo" type="p:xslt">
    <p:input port="source">
      <p:pipe step="chapters" port="current"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="fo/docbook.xsl"/>
    </p:input>
  </p:step>
  <p:step name="xform-to-html" type="p:xslt">
    <p:input port="source">
      <p:pipe step="chapters" port="current"/>
    </p:input>
    <p:input port="stylesheet">
      <p:document href="html/docbook.xsl"/>
    </p:input>
  </p:step>
</p:for-each>

The //chapters of the DocBook document are selected. Each chapter is transformed into HTML and XSL Formatting Objects using an XSLT step. The resulting HTML and FO documents are aggregated together and appear on the html and fo ports, respectively, of the chapters construct itself.

It is a static error if any declared output does not specify a binding.

3.3 Viewport

A viewport construct processes a single document, applying its subpipeline to one or more subsections of the document.

Inputs: source, a single document.
Outputs: result, a single document.
Parameters: As declared.
Contained components: As declared.

The context of a viewport is its inherited context modified as follows:

All of the declared inputs of the viewport are added to the outputs in the context.
The union of all the declared outputs of the contained components are added to the outputs in the context.
All of the declared parameters of the viewport are added to the parameters in the context.

This is the context used by the viewport and inherted by its contained components.

The result of the viewport is a copy of the original document with the selected subsections replaced by the results of applying the subpipeline to them.

For example, a viewport might accept an XHTML document as its input, apply encryption to selected div elements within that document, and return an XHTML document that is the same as the original except that each selected div has been replaced by its encrypted result.

The p:viewport element represents a viewport.

<p:viewport name = QName match = xpath expression> (p:input, p:output, p:parameter*, subpipeline) </p:viewport>

The p:viewport component has exactly one input named “source” and exactly one output named “result”. A binding must be specified for each port.

The match attribute specifies an XPath expression that is a Pattern in ???. Each matching node in the source document is wrapped in a document node and provided to the viewport's subpipeline.

The processor provides each document to the subpipeline represented by the children of the p:viewport on a port named current.

What appears on the output from the p:viewport will be a copy of the input document except that each matching node is replaced by the result of applying the subpipeline to that node.

It is a dynamic error if the input source is a sequence of more than one document or if the output from any iteration is a sequence of more than one document.

Example 2, “A Sample Viewport” shows an example of a p:viewport in action.

Example 2. A Sample Viewport

<p:viewport name="encdivs" match="h:div[@class='enc']>
  <p:input port="source">
    <p:pipe step="step" port="port"/>
  </p:input>
  <p:output port="result">
    <p:pipe step="encrypt" port="result"/>
  </p:output>
  <p:step name="encrypt" type="p:encrypt-document">
    <p:input port="source">
      <p:pipe step="encdivs" port="current"/>
    </p:input>
  </:step>
</p:viewport>

The nodes which match h:div[@class='enc'] (according to the rules of ???) in the input document are selected. Each selected h:div is encrypted and the resulting encrypted version replaces the original h:div. The result of the whole construct is a copy of the input document with each selected h:div encrypted.

It is a static error if either the source or result ports do not specify a binding.

3.4 Choose

A choose construct selects exactly one of a list of alternative subpipelines based on the evaluation of XPath expressions.

Inputs: As declared on the subpipelines.
Outputs: As declared on the subpipelines.
Parameters: As declared.
Contained components: As declared. A choose construct contains several alternate subpipelines, exactly one of which will be evaluated.

The context of a choose is its inherited context modified as follows:

All of the declared inputs of the choose are added to the outputs in the context.
The declared outputs of (any one of) the subpipelines are added to the outputs in the context.
All of the declared parameters of the choose are added to the parameters in the context.

This is the context used by the choose and inherted by its subpipelines.

The list of alternative subpipelines consists of zero or more subpipelines, each guarded by an XPath expression (with an associated context document), followed optionally by a single default subpipeline.

The choose considers each subpipeline in turn and selects the first (and only the first) subpipeline for which the guard expression evaluates to true in the context of its context document. If there are no subpipelines for which the expression evaluates to true, the default subpipeline, if it was specified, is selected.

After a subpipeline is selected, it is evaluated as if only it had been present.

The context of the contained components in the selected subpipeline is the context of the choose with the union of all the declared outputs of the contained components of the selected subpipeline added to the outputs in the context.

The result of the choose is the result of the selected subpipeline.

For example, a choose might test a schema and apply XML Schema validation to an input document if the schema is an XML Schema document, apply RELAX NG validation if the schema is a RELAX NG grammar, or perform no validation otherwise.

In order to ensure that the result of the choose is consistent irrespective of the subpipeline chosen, each subpipeline must declare the same number of inputs and outputs with the same names. It is a static error if two subpipeline in a choose declare different inputs or outputs.

The p:choose element represents a choose.

<p:choose name = QName> (p:input?, p:when*, p:otherwise?) </p:choose>

Each p:when branch of the p:choose has a test attribute which must contain an XPath expression. That XPath expression's effective boolean value is the guard expression for the subpipeline contained within that p:when.

The p:choose can specify the context node against which the XPath expressions that occur on each branch are evaluated. The context node is specified as a binding for the input port named “source”.

Each conditional subpipeline is represented by a p:when element.

<p:when test = expression> (p:input?, p:output*, p:parameter*, subpipeline) </p:when>

The p:when can specify a context node against which its test expression is to be evaluated. That context node is specified as a binding for the input port “source”.

If no context is specified on the p:when, the context specified on the p:choose is used. It is a static error if no context is specified in either place.

The default branch is represented by a p:otherwise element.

<p:otherwise> (p:output*, p:parameter*, subpipeline) </p:otherwise>

All of the p:when branches and the p:otherwise must declare the same number of output ports with the same names. It is a static error if they do not.

The result of the p:choose is the result of the selected subpipeline. It is a dynamic error if no p:when is selected and no p:otherwise is specified.

Example 3, “A Sample Choose” shows an example of a p:choose in action.

Example 3. A Sample Choose

<p:choose name="version">
  <p:input port="source">
    <p:pipe step="prevstep" port="result"/>
  </p:input>

  <p:when test="/*[@version = 2]">
    <p:output port="result">
      <p:pipe step="v2valid" port="result"/>
    </p:output>

    <p:step type="p:validate" name="v2valid">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
      <p:input port="schema">
	<p:document href="v2schema.xsd"/>
      </p:input>
    </p:step>
  </p:when>

  <p:when test="/*[@version = 1]">
    <p:output port="result">
      <p:pipe step="v1valid" port="result"/>
    </p:output>

    <p:step type="p:validate" name="v2valid">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
      <p:input port="schema">
	<p:document href="v1schema.xsd"/>
      </p:input>
    </p:step>
  </p:when>

  <p:otherwise>
    <p:output port="result">
      <p:pipe step="ident" port="result"/>
    </p:output>

    <p:step type="p:identity" name="ident">
      <p:input port="source">
        <p:pipe step="prevstep" port="result"/>
      </p:input>
    </p:step>
  </p:otherwise>
</p:choose>