The presentation of this document has been augmented to identify changes from a previous version. Three kinds of changes are highlighted: new, added text, changed text, and deleted text.
This document is also available in these non-normative formats: XML and with differences marked.
Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document is being built to articulate requirements for the development of a subsequent version of XProc: An XML Pipeline Language.
This document is an editors' copy that has no official standing.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This Editor's Working Draft has been produced by the authors listed above, at the will of the chair, and with the consent of the members W3C XML Processing Model Working Group as part of the XML Activity, following the procedures set out for the W3C Process. The goals of the XML Processing Model Working Group are discussed in its charter.
Comments on this document should be sent to the W3C mailing list public-xml-processing-model-comments@w3.org (archive).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. This document is informative only. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
1 Introduction
1.1 XProc V.next Goals
1.2 Editorial Process
2 Terminology
3 Design Principles
3.1 Technology Neutral
3.2 Platform Neutral
3.3 Small and Simple
3.4 Infoset Processing
3.5 Straightforward Core Implementation
3.6 Address Practical Interoperability
3.7 Validation of XML Pipeline Documents by a Schema
3.8 Reuse and Support for Existing Specifications
3.9 Arbitrary Components
3.10 Control of Inputs and Outputs
3.11 Control of Flow and Errors
4 Requirements
4.1 Standard Names in Step Inventory
4.2 Allow Defining New Components and Steps
4.3 Minimal Component Support for Interoperability
4.4 Allow Pipeline Composition
4.5 Iteration of Documents and Elements
4.6 Conditional Processing of Inputs
4.7 Error Handling and Fall-back
4.8 Support for the XPath 2.0 Data Model
4.9 Allow Optimizations
4.10 Streaming XML Pipelines
5 Use cases
5.1 Apply a Sequence of Operations
5.2 XInclude Processing
5.3 Parse/Validate/Transform
5.4 Document Aggregation
5.5 Single-file Command-line Document Processing
5.6 Multiple-file Command-line Document Generation
5.7 Extracting MathML
5.8 Style an XML Document in a Browser
5.9 Run a Custom Program
5.10 XInclude and Sign
5.11 Make Absolute URLs
5.12 A Simple Transformation Service
5.13 Service Request/Response Handling on a Handheld
5.14 Interact with Web Service (Tide Information)
5.15 Parse and/or Serialize RSS descriptions
5.16 XQuery and XSLT 2.0 Collections
5.17 An AJAX Server
5.18 Dynamic XQuery
5.19 Read/Write Non-XML File
5.20 Update/Insert Document in Database
5.21 Content-Dependent Transformations
5.22 Configuration-Dependent Transformations
5.23 Response to XML-RPC Request
5.24 Database Import/Ingestion
5.25 Metadata Retrieval
5.26 Non-XML Document Production
5.27 Integrate Computation Components (MathML)
5.28 Document Schema Definition Languages (DSDL) - Part 10: Validation Management
5.29 Large-Document Subtree Iteration
5.30 Adding Navigation to an Arbitrarily Large Document
5.31 Fallback to Choice of XSLT Processor
5.32 No Fallback for XQuery Causes Error
A Normative References
A.1 Reference Documents
A.2 Core XML Specifications
A.3 XML Data Model and XML Information Set
A.4 XPath and XQuery
A.5 Style, Transform, Serialize
A.6 XML Schema Languages
A.7 Identifiers and Names
A.8 HTTP Request & Authentication
A.9 Character Encodings
A.10 Media Types
A.11 Digital Signatures
B Non-Normative References
B.1 Candidate Specifications: XPointers
B.2 Candidate Specification: XLink
B.3 Candidate Specification: Mathematics
B.4 Candidate Specification: XForms
B.5 Candidate Specification: EXI
B.6 Candidate Specifications: HTML
B.7 Candidate Specifications: CSS
B.8 Candidate Specifications: Popular Publishing Profiles
B.9 Candidate Specification: B2B Transaction Language
B.10 Candidate Specifications: Digital Signatures and Encryption
B.11 Candidate Specifications: Semantic Web
B.12 Candidate Specification: SMIL
B.13 Candidate Specification: Mail Messages
B.14 Candidate Non-XML Data Format Specifications
B.15 Candidate Specifications: Web Distributed Authoring and Versioning (WebDAV)
B.16 Reference Processors?
C Unsatisfied V1 CR Issues
C.1 Issue 001: p:template extension
C.2 Issue 004: attribute value templates
C.3 Issue 006: p:data/p:load harmonization
C.4 Issue 010: document base URI
C.5 Issue 015: JSON hack
C.6 Issue 016: conditional output port
C.7 Issue 017: p:store
D Unsatisfied V1 Requirements and Use Cases
E FYI: Categorized Steps
E.1 Library and Pipeline Construction
E.2 Core Pipeline Operations
E.3 Input Sources
E.4 Output Targets
E.5 Variables, Options and Parameters
E.6 Micro-operations
E.7 Transformation
E.8 Query
E.9 Validation
E.10 Document Operations
E.11 File & Directory Operations
E.12 Image Operations
E.13 Sequence Operations
E.14 Input / Output
E.15 Encoding
E.16 Execution Control
E.17 Resource / Collection Management
E.18 Miscellaneous
E.19 XProc Operations
E.20 Environment
E.21 Error / Message Handling
E.22 Debugging
F Collected Input
F.1 Architecture
F.1.1 What Flows?
F.1.1.1 Sequences
F.1.1.2 Sets of Documents
F.1.1.3 MetaData, HTML5, JSON, Plain Text
F.1.2 Events
F.1.3 Synchronization & Concurrency
F.2 Resource Management
F.2.1 Add a Resource Manager
F.2.2 Dynamic pipeline execution
F.2.2.1 Dynamic Manifolds
F.2.3 Information caches
F.2.4 Environment
F.2.5 Datatypes
F.3 Integration
F.3.1 XML Choreography
F.3.2 Authentication
F.3.3 Clustering
F.3.4 Debugging
F.3.5 Fall-back Mechanism
F.3.6 Test Suite
F.4 Usability
F.4.1 Cross platform pipelines
F.4.2 Documentation Conventions
F.4.2.1 p:documentation
F.4.3 Verbosity
F.4.3.1 p:data
F.4.3.2 p:input
F.4.3.3 p:load
F.4.3.4 p:option
F.4.3.5 p:pipe
F.4.3.6 p:serialization
F.4.3.7 p:store
F.4.3.8 p:string-replace
F.4.3.9 p:template
F.4.3.10 p:try
F.4.3.11 p:variable
F.4.3.12 p:viewport
F.4.4 Parameter Rules
F.4.5 Choose-style binding
F.4.6 Remove Restriction on variables/options/params
F.4.7 Attribute Value Templates
F.4.8 Loading computed URIs
F.4.9 Optional options for declared steps
F.4.10 Output signatures for compound steps
F.4.11 XPath
F.4.12 Simplify Use of File Sets
F.4.13 Streaming and Parallel Processing
F.4.14 Required Primary Port
F.5 New Steps
F.5.1 Various Suggestions
F.5.2 OS Operations
F.5.2.1 pos:cwd
F.5.2.2 pos:env
F.5.2.3 pos:info
F.5.3 Directory Operations
F.5.3.1 pxf:copy
F.5.3.2 pxf:chdir
F.5.3.3 pxf:delete
F.5.3.4 pxf:head
F.5.3.5 pxf:info
F.5.3.6 pxf:mkdir
F.5.3.7 pxf:move
F.5.3.8 pxf:tail
F.5.3.9 pxf:tempfile
F.5.3.10 pxf:touch
F.5.4 Zip Operations
F.5.4.1 pxp:unzip
F.5.4.1.1 Thoughts from Vojtech
F.5.4.2 pxp:zip
F.5.4.2.1 Thoughts from Vojtech
F.5.5 Cookie Operations
F.5.5.1 cx:get-cookies
F.5.5.2 cx:set-cookies
F.5.6 Dynamic pipeline evaluation
F.5.6.1 xyz:apply
F.5.6.2 cx:eval
F.5.7 Validation Operations
F.5.7.1 pxp:nvdl
F.5.8 Messaging Operation
F.5.8.1 cx:send-mail
F.5.9 Digital Signatures
F.5.9.1 xyz:sign
F.5.10 File Sets
F.5.10.1 xyz:documents
F.5.11 Iteration
F.5.11.1 xyz:iterate
F.5.11.2 p:iteration-source
F.5.11.3 xyz:until-unchanged
F.5.12 Debugging Operations
F.5.12.1 dbxml:breakpoint
F.5.12.2 dbxml:comment
F.5.12.3 dbxml:debug
F.5.12.4 dbxml:message
F.5.12.5 dbxml:trace
F.5.12.6 dbxml:tracediff
G Contributors
A large and growing set of specifications describe processes operating on XML documents. Many applications depend on the use of more than one of the many inter-related XML family of specifications. How implementations of these specifications interact affects interoperability. XProc: An XML Pipeline Language is designed for describing operations to be performed on XML documents.
"An XML Pipeline specifies a sequence of operations to be performed on a collection of XML input documents. Pipelines take zero or more XML documents as their input and produce zero or more XML documents as their output. A pipeline consists of steps. Like pipelines, steps take zero or more XML documents as their inputs and produce zero or more XML documents as their outputs. The inputs of a step come from the web, from the pipeline document, from the inputs to the pipeline itself, or from the outputs of other steps in the pipeline. The outputs from a step are consumed by other steps, are outputs of the pipeline as a whole, or are discarded. There are three kinds of steps: atomic steps, compound steps, and multi-container steps. Atomic steps carry out single operations and have no substructure as far as the pipeline is concerned. Compound steps and multi-container steps control the execution of other steps, which they include in the form of one or more subpipelines." -- XProc: An XML Pipeline Language
This specification contains requirements for an anticipated XProc V.next. This specification is concerned with the conceptual model of XML process interactions, the language for the description of these interactions, and the inputs and outputs of the overall process. This specification is not generally concerned with the implementations of actual XML processes participating in these interactions.
Editorial note | |
The editors intend to enumerate the Working Group's goals for XProc V.next to guide our efforts, and these may ultimately inform 3 Design Principles. |
Improving ease of use (syntactic improvements)
Improving ease of use (increasing the scope: non XML content, for example)
Addressing known shortcomings in the language
Improve relationship with streaming and parallel processing
The following is a strawman list; it has no standing with the Working Group and is likely to be replaced and/or expanded daily until further notice.
Iterate until ready to declare success. (<p:iterate-until value="success" />)
Review 2 Terminology.
Review 3 Design Principles.
Review 4 Requirements.
Review 5 Use cases
Gather and review A Normative References
Gather and review C Unsatisfied V1 CR Issues
Audit existing D Unsatisfied V1 Requirements and Use Cases
Gather and review E FYI: Categorized Steps
Gather and review input from stakeholders.
Discuss.
Update existing definitions, design principles, requirements and use cases.
Enumerate new definitions, design principles, requirements and use cases.
Review.
Approve.
Publish.
Note:
The Working Group should review the definitions included here to determine whether changes are warranted in light of the publication of XProc: An XML Pipeline Language. Additional term definitions may be warranted and will be added as needed.
An XML Information Set or "Infoset" is the name we give to any implementation of a data model for XML which supports the vocabulary as defined by the XML Information Set recommendation [xml-infoset-rec].
An XML Pipeline is a conceptualization of a flow of a configuration of steps and their parameters. The XML Pipeline defines a process in terms of order, dependencies, or iteration of steps over XML information sets.
"[A pipeline is a set of connected steps, with outputs of one step flowing into inputs of another.]" -- XProc: An XML Pipeline Language
A pipeline specification document is an XML document that described an XML pipeline.
This definition does not seem to be helpful any longer. XProc 1.0 refers to an XML pipeline, or simply a pipeline.
A step is a specification of how a component is used in a pipeline that includes inputs, outputs, and parameters.
"[A step is the basic computational unit of a pipeline.] A typical step has zero or more inputs, from which it receives XML documents to process, zero or more outputs, to which it sends XML document results, and can have options and/or parameters. There are three kinds of steps: atomic, compound, and multi-container. A pipeline is itself a step and must satisfy the constraints on steps. Connections between steps occur where the input of one step is connected to the output of another."-- XProc: An XML Pipeline Language
A component is an particular XML technology (e.g. XInclude, XML Schema Validity Assessment, XSLT, XQuery, etc.).
An XML infoset that is an input to a XML Pipeline or Step.
Relates to F.1.1 What Flows?
The result of processing by an XML Pipeline or Step.
"[The output ports declared on a step are its declared outputs.] When a step is used in a pipeline, it is connected to other steps through its inputs and outputs." -- XProc: An XML Pipeline Language
A parameter is input to a Step or an XML Pipeline in addition to the Input and Output Document(s) that it may access. Parameters are most often simple, scalar values such as integers, booleans, and URIs, and they are most often named, but neither of these conditions is mandatory. That is, we do not (at this time) constrain the range of values a parameter may hold, nor do we (at this time) forbid a Step from accepting anonymous parameters.
"Some steps accept parameters. Parameters are name/value pairs, like variables and options. Unlike variables and options, which have names known in advance to the pipeline, parameters are not declared and their names may be unknown to the pipeline author. Pipelines can dynamically construct sets of parameters. Steps can read dynamically constructed sets on parameter input ports. [...] A parameter input port is a distinguished kind of input port which accepts (only) dynamically constructed parameter name/value pairs".-- XProc: An XML Pipeline Language
Relates to F.4.4 Parameter Rules and C.2 Issue 004: attribute value templates
The technology or platform environment in which the XML Pipeline is used (e.g. command-line, web servers, editors, browsers, embedded applications, etc.).
"[The environment is a context-dependent collection of information available within subpipelines.] Most of the information in the environment is static and can be computed for each subpipeline before evaluation of the pipeline as a whole begins. The in-scope bindings have to be calculated as the pipeline is being evaluated." -- XProc: An XML Pipeline Language
Relates to proposed steps: F.5.2.2 pos:env and F.5.3.5 pxf:info
The ability to parse an XML document and pass infoitems between components without building a full document information set.
This editor has not discovered corresponding language in XProc: An XML Pipeline Language. Relates to Usability: F.4.13 Streaming and Parallel Processing. -- MM
Note:
The Working Group should review the design principles included here to determine whether changes are warranted in light of the publication of XProc: An XML Pipeline Language. Additional design principles may be warranted and will be added as needed.
Please note that section numbering has been added to facilitate hypertextual references to the individual design principles.
The design principles described in this document are requirements whose compliance with is an overall goal for the specification. It is not necessarily the case that a specific feature meets the requirement. Instead, it should be viewed that the whole set of specifications related to this requirements document meet that overall goal specified in the design principle.
Applications should be free to implement XML processing using appropriate technologies such as SAX, DOM, or other infoset representations.
Application computing platforms should not be limited to any particular class of platforms such as clients, servers, distributed computing infrastructures, etc. In addition, the resulting specifications should not be swayed by the specifics of use in those platform.
The language should be as small and simple as practical. It should be "small" in the sense that simple processing should be able to stated in a compact way and "simple" in the sense the specification of more complex processing steps do not require arduous specification steps in the XML Pipeline Specification Document.
At a minimum, an XML document is represented and manipulated as an XML Information Set. The use of supersets, augmented information sets, or data models that can be represented or conceptualized as information sets should be allowed, and in some instances, encouraged (e.g. for the XPath 2.0 Data Model).
Relates to F.4.11 XPath
It should be relatively easy to implement a conforming implementation of the language but it should also be possible to build a sophisticated implementation that implements its own optimizations and integrates with other technologies.
An XML Pipeline must be able to be exchanged between different software systems with a minimum expectation of the same result for the pipeline given that the XML Pipeline Environment is the same. A reasonable resolution to platform differences for binding or serialization of resulting infosets should be expected to be address by this specification or by re-use of existing specifications.
The XML Pipeline Specification Document should be able to be validated by both W3C XML Schema and RelaxNG.
Probably should add Schematron. [Schematron]
XML Pipelines need to support existing XML specifications and reuse common design patterns from within them. In addition, there must be support for the use of future specifications as much as possible.
The specification should allow use of any component technology that can consume or produce XML Information Sets.
An XML Pipeline must allow control over specifying both the inputs and outputs of any process within the pipeline. This applies to the inputs and outputs of both the XML Pipeline and its containing steps. It should also allow for the case where there might be multiple inputs and outputs.
An XML Pipeline must allow control of the explicit and implicit handling of the flow of documents between steps. When errors occur, these must be able to be handled explicitly to allow alternate courses of action within the XML Pipeline.
Note:
In this section, Editor's Notes appended to each sub-section provide commentary on the status of each requirement. In particular, the editors have made note of whether a requirement has been demonstrably "Satisfied" or whether it remains "Unsatisfied". In the case of requirements that remain Unsatisfied, the editors intend to record potential solutions, in the form of proposals for new steps or changes to existing steps. In the case of demonstrably Satisfied requirements, the editors intend to provide examples, or links to examples, especially those in XProc: An XML Pipeline Language.
XProc must have standard names for atomic steps that correspond with, but not limited to, the following specifications [xml-core-wg]:
XML Base [XMLBase]
XInclude [XInclude]
XSLT [XSLT-1.0], [XSLT-2.0]
XSL FO [Serialization]
XQuery [XQuery-1.0]
XPath and Functions [XPath1.0], [XPath-2.0][XPath-Functions]
XML Schema [XMLSchema1][XMLSchema2]
RELAX NG. [RELAX-NG]
Schematron [Schematron]
HTTP Request and Authentication [RFC-2616] [RFC-2616]
Editorial note: Satisfied | 20120407 |
This requirement is satisfied. |
An XML Pipeline must allow applications to define and share new steps that use new or existing components. [xml-core-wg]
Editorial note: Satisfied | 20120407 |
The ability to define additional step types is Implementation-defined. |
There must be a minimal inventory of components defined by the specification that are required to be supported to facilitate interoperability of XML Pipelines.
XProc identifies its Standard Step Library and subdivides it into Required Steps and Optional Steps.
Editorial note: Satisfied | 20120407 |
Minimal Component Support has been defined. |
Mechanisms for XML Pipeline composition for re-use or re-purposing must be provided within the XML Pipeline Specification Document.
Editorial note: Satisfied | 20120407 |
See Example 1. A simple, linear XInclude/Validate pipeline |
XML Pipelines should allow iteration of a specific set of steps over a collection of documents and or elements within a document.
Editorial note: Satisfied | 20120407 |
This requirement is satisfied. Both p:for-each and p:viewport process a sequence of documents. Relates to F.5.11 Iteration |
To allow run-time selection of steps, XML Pipelines should provide mechanisms for conditional processing of documents or elements within documents based on expression evaluation. [xml-core-wg]
Editorial note: Satisfied | 20120407 |
This requirement is satisfied. See Figure 2, “A validate and transform pipeline”. |
XML Pipelines must provide mechanisms for addressing error handling and fall-back behaviors. [xml-core-wg]
Editorial note: Satisfied | 20120407 |
This requirement is at least partially satisfied by XProc: All steps have an implicit output port for reporting errors; error handling and fallback are manageable through use of p:try. and p:catch. Relates to F.3.5 Fall-back Mechanism |
Relates to F.4.11 XPath
XML Pipelines must support the XPath 2.0 Data Model to allow support for XPath 2.0, XSLT 2.0, and XQuery as steps.
Note:
At this point, there is no consensus in the working group that minimal conforming implementations are required to support the XPath 2.0 Data Model.
Editorial note: Satisfied | 20120407 |
This requirement is satisfied, with the caveats noted in 2.6 XPaths in XProc. There have been suggestions that support for the XPath 2.0 Data Model should be required. |
An XML Pipeline should not inhibit a sophisticated implementation from performing parallel operations, lazy or greedy processing, and other optimizations. [xml-core-wg]
Editorial note: Partially Satisfied | 20120407 |
This requirement is partially satisfied, with the caveats noted in H Sequential steps, parallelism, and side-effects. That is, XProc does not inhibit sophisticated implementations; pipelines which take advantage of implementation features may be less interoperable. See Editors' Notes under 4.10 Streaming XML Pipelines |
An XML Pipeline should allow for the existence of streaming pipelines in certain instances as an optional optimization. [xml-core-wg]
Editorial note: Unsatisfied | 20120407 |
This requirement neither explicitly satisfied nor unsatisfied by anything in the language or an existence proof, except as noted 7.1.23 p:split-sequence. |
Editorial note | |
We observe that streaming, parallel processing and clustering are optimizations that may impose requirements on various aspects of processor implementation and pipeline design. Notably, any step, atomic or otherwise, that buffers its input or output can be an impediment to streaming. Documentation about the streamability of each atomic step may be warranted. Pipeline design can also affect the ability to process a stream, or to process parallel streams. The editors note the absence of streaming XProc processors or exemplars of parallel pipelines from which to interpolate requirements. The editors therefore request that the WG engage in a targeted discussion of the design principles and requirements incumbent upon XProc in support of Streaming, Parallel Processing and Clustering. See Usability: F.4.13 Streaming and Parallel Processing . See also Use Cases: 5.29 Large-Document Subtree Iteration and 5.30 Adding Navigation to an Arbitrarily Large Document. |
This section contains a set of use cases that supported our requirements and informed our design. While there was a want to address all the use cases listed in this document, in the end, the first version may not have solved all the following use cases. Those unsolved use cases may be migrated into XProc V.next.
Note:
In this section, Editor's Notes appended to each sub-section provide commentary on the status of each Use Case. In particular, the editors have made note of whether a Use Case has been demonstrably "Satisfied" or whether it remains "Unsatisfied". A "TBD" anotation indicates that the status has yet to be ascertained. Some use cases may be only partially satisfied.
In the case of requirements that remain Unsatisfied, the editors intend to record potential solutions, in the form of proposals for new steps or changes to existing steps. In the case of demonstrably Satisfied requirements, the editors intend to provide illustrative examples, or links to examples, especially those in the XProc: An XML Pipeline Language.
Note that these determinations of status are subject to change, especially in the early stages of the development of this document. -- MM
Apply a sequence of operations such XInclude, validation, and transformation to a document, aborting if the result or an intermediate stage is not valid.
(source: [xml-core-wg])
Editorial note: Satisfied | 20120407 |
This use case is satisfied by Example 1. A simple, linear XInclude/Validate pipeline in the Introduction |
Retrieve a document containing XInclude instructions.
Locate documents to be included.
Perform XInclude inclusion.
Return a single XML document.
Editorial note: Satisfied | 20120407 |
This use case is satisfied by Example 1. A simple, linear XInclude/Validate pipeline in the Introduction |
Parse the XML.
Perform XInclude.
Validate with Relax NG, possibly aborting if not valid.
Validate with W3C XML Schema, possibly aborting if not valid.
Transform.
Editorial note: Satisfied | 20120407 |
This use case is almost satisfied by Examples 1-3 in the Introduction. The example does not include Relax NG validation, but it could have, and Schematron as well. |
Locate a collection of documents to aggregate.
Perform aggregation under a new document element.
Return a single XML document.
Editorial note: Satisfied | 20120407 |
This use case is satisfied, as exemplified in p:for-each |
Read a DocBook document.
Validate the document.
Process it with XSLT.
Validate the resulting XHTML.
Save the HTML file using HTML serialization.
Editorial note: Satisfied | 20120407 |
Although the processing scenario described above is exemplified in p:for-each, the command-line requirement is considered to be implementation defined. ["How outside values are specified for pipeline parameters on the pipeline initially invoked by the processor is implementation-defined. In other words, the command line options, APIs, or other mechanisms available to specify such parameter values are outside the scope of [XProc 1.0]."] |
Read a list of source documents.
For each document in the list:
Read the document.
Perform a series of XSLT transformations.
Serialize each result.
Alternatively, aggregate the resulting documents and serialize a single result.
Editorial note: Satisfied | 20120407 |
Although the processing scenario described above is exemplified in p:for-each, the command-line requirement is considered to be implementation defined. ["How outside values are specified for pipeline parameters on the pipeline initially invoked by the processor is implementation-defined. In other words, the command line options, APIs, or other mechanisms available to specify such parameter values are outside the scope of [XProc 1.0]."] |
Extract MathML fragments from an XHTML document and render them as images. Employ an SVG renderer for SVG glyphs embedded in the MathML.
(source: [xml-core-wg])
Editorial note: TBD | 20120407 |
This use case is [not] satisfied. Describe a step that performs these steps. |
We could refactor this use case, using p:viewport to extract MathML. We could model the rendering steps, but the existence of implementations is beyond the scope of XProc itself. That is, steps 2 is a black box to us; we simply don't care whether it works, so long as we can model it.
Extract MathML fragments from an XHTML document
Transform each MathML element into one or more substitutes:
Apply a computation (e.g. compute the kernel of a matrix).
Render extracted fragments as JPEG images.
Employ an SVG renderer for SVG glyphs embedded in the MathML.
Render using TeX
Render using eqn/troff
Replace MathML fragments with computed and/or rendered equivalents.
Please provide an example of a step that responds to this use case.
Style an XML document in a browser with one of several different stylesheets without having multiple copies of the document containing different xml-stylesheet directives.
(source: [xml-core-wg])
Editorial note: Satisfied | 20120407 |
This use case is satisfied, as exemplified in p:for-each |
Run a program of your own, with some parameters, on an XML file and display the result in a browser.
(source: [xml-core-wg])
Editorial note: Satisfied | 20120412 |
This use case is satisfied, as exemplified in the following example. |
<?xml version="1.0" encoding="UTF-8"?> <p:declare-step xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:p="http://www.w3.org/ns/xproc" xmlns:cx="http://xmlcalabash.com/ns/extensions" version="1.0" exclude-inline-prefixes="cx c p"> <p:input port="source"> <p:inline> <test/> </p:inline> </p:input> <p:output port="result"/> <p:exec command="/bin/cat" result-is-xml="true"/> </p:declare-step>
will generate
<c:result xmlns:c="http://www.w3.org/ns/xproc-step"> <test/> </c:result>
Process an XML document through XInclude.
Transform the result with XSLT using a fixed transformation.
Digitally sign the result with XML Signatures.
Editorial note: XInclude Satisfied | |
This use case is satisfied. |
Editorial note: XML Signatures Unsatisfied | |
This use case is not satisfied. The editors note that this Use Case cannot be
satisfied unless/until a new sign step is created. Accordingly, a
sign step has been included in the list of proposed new steps. The chair
has noted that design and implementation of a sign step could prove
difficult and that another group is likely better equipped to produce a solution.
Discuss. |
Process an XML document through XInclude.
Remove any xml:base attributes anywhere in the resulting document.
Schema validate the document with a fixed schema.
For all elements or attributes whose type is xs:anyURI, resolve the value against the base URI to create an absolute URI. Replace the value in the document with the resulting absolute URI.
This example assumes preservation of infoset ([base URI]) and PSVI ([type definition]) properties from step to step. Also, there is no way to reorder these steps as the schema doesn't accept xml:base attributes but the expansion requires xs:anyURI typed values.
Editorial note: Satisfied | 20120407 |
This use case is satisfied, as exemplified in 7.2.10 p:xsl-formatter |
Alex to refactor these use cases:
Extract XML document (XForms instance) from an HTTP request body
Execute XSLT transformation on that document.
Call a persistence service with resulting document
Return the XML document from persistence service (new XForms instance) as the HTTP response body.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write example and description of persistence. |
Allow an application on a handheld device to construct a pipeline, send the pipeline and some data to the server, allow the server to process the pipeline and send the result back.
(source: [xml-core-wg])
Editorial note: TBD | 20120419 |
This use case is UNSATISFIED and deemed not worth the effort to prove. The Use Case is underspecified and we estimate that it would cost up to 1/2 day to create example of working pipeline in mobile browser, as suggested by VT. |
Parse the incoming XML request.
Construct a URL to a REST-style web service at the NOAA (see website).
Parse the resulting invalid HTML document with by translating and fixing the HTML to make it XHTML (e.g. use TagSoup or tidy).
Extract the tide information from a plain-text table of data from document by applying a regular expression and creating markup from the matches.
Use XQuery to select the high and low tides.
Formulate an XML response from that tide information.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write a pipeline to demonstrate. |
Parse descriptions:
Iterate over the RSS description elements and do the following:
Gather the text children of the 'description' element.
Parse the contents with a simulated document element in the XHTML namespace.
Send the resulting children as the children of the 'description element.
Apply rest of pipeline steps.
Serialize descriptions
Iterate over the RSS description elements and do the following:
Serialize the children elements.
Generate a new child as a text children containing the contents (escaped text).
Apply rest of pipeline steps.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write pipeline to demonstrate. |
In XQuery and XSLT 2.0 there is the idea of an input and output collection and a pipeline must be able to consume or produce collections of documents both as inputs or outputs of steps as well as whole pipelines.
For example, for input collections:
Accept a collection of documents.
Apply a single XSLT 2.0 transformation that processes the collection and produces another collection.
Serialize the collection to files or URIs.
For example, for output collections:
Accept a single document as input.
Apply an XQuery that produces a sequence of documents (a collection).
Serialize the collection to files or URIs.
Editorial note: Satisfied | 20120425 |
This use case is satisfied, as exemplified the sample pipeline. |
<p:pipeline name="main" version="1.0" xmlns:cx="http://xmlcalabash.com/ns/extensions" xmlns:p="http://www.w3.org/ns/xproc"> <p:declare-step type="cx:collection-manager"> <p:input port="source" sequence="true"/> <p:output port="result" sequence="true" primary="false"/> <p:option name="href" required="true"/> </p:declare-step> <cx:collection-manager name="cxmgr" href="http://example.org/collection"> <p:input port="source"> <p:inline><doc1/></p:inline> <p:inline><doc2/></p:inline> <p:inline><doc3/></p:inline> </p:input> </cx:collection-manager> <p:xslt> <p:input port="source"> <p:pipe step="cxmgr" port="result"/> </p:input> <p:input port="stylesheet"> <p:inline> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output method="xml" encoding="utf-8" indent="no" omit-xml-declaration="yes"/> <xsl:param name="collection" select="'http://example.org/collection'"/> <xsl:template match="/"> <collection uri="{$collection}"> <xsl:value-of select="count(collection($collection))"/> </collection> </xsl:template> </xsl:stylesheet> </p:inline> </p:input> </p:xslt> </p:pipeline>
Receive XML request with word to complete.
Call a sub-pipeline that retrieves list of completions for that word.
Format resulting document with XSLT.
Serialize response to XML.
Editorial note: REFACTOR | |
This use case to be refactored into 5.12 A Simple Transformation Service |
Dynamically create an XQuery query using XSLT, based on input XML document.
Execute the XQuery against a database.
Construct an XHTML result page using XSLT from the result of the query.
Serialize response to HTML.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Norm to write a pipeline to demonstrate. |
This pipeline accepts a "uri" document on the source port, uses that URI to construct a (brain-dead simple) query against a database, runs that query, and styles the result.
<p:declare-step name="main" version="1.0" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:ml="http://xmlcalabash.com/ns/extensions/marklogic" xmlns:p="http://www.w3.org/ns/xproc"> <p:input port="source"> <p:inline> <uri>/2003/08/20/fungus</uri> </p:inline> </p:input> <p:output port="result"/> <p:input port="parameters" kind="parameter"/> <p:declare-step type="ml:adhoc-query"> <p:input port="source"/> <p:input port="parameters" kind="parameter"/> <p:output port="result" sequence="true"/> <p:option name="host"/> <p:option name="port"/> <p:option name="user"/> <p:option name="password"/> <p:option name="content-base"/> <p:option name="wrapper"/> </p:declare-step> <p:template> <p:input port="template"> <p:inline> <c:xquery> doc("/production{string(/uri)}.xml") </c:xquery> </p:inline> </p:input> <p:input port="source"> <p:pipe step="main" port="source"/> </p:input> </p:template> <ml:adhoc-query host="localhost" port="8404" user="admin" password="password"/> <p:xslt> <p:input port="stylesheet"> <p:document href="essay.xsl"/> </p:input> </p:xslt> </p:declare-step>
Relates to F.1.1 What Flows? and 5.19 Read/Write Non-XML File and 5.26 Non-XML Document Production
Read a CSV [CSV] file and convert it to XML.
Process the document with XSLT.
Convert the result to a CSV format using text serialization.
Editorial note: UNSATISFIED | 20120407 |
This use case is UNSATISFIED. Relates to XProc Architecture: F.1.1 What Flows?. An example, possibly relying on a shell command for conversion to/from CSV format could be constructed, but that would miss the point; XProc could/should have native ability to convert to/from trivial data formats such as CSV and JSON. Presumably there are algorithmic transforms. Vojtech to provide reference to XML Prague paper. |
The specific use case described in 5.19 (converting a CSV file to XML) can be solved by using XSLT 2.0 to tokenize the CSV data and turn it into XML. The example below uses the stylesheet developed by Andrew Welsh (http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.html):
<p:declare-step > <p:output port="result"/> <p:option name="pathToCSV" required="true"/> <p:xslt template-name="main"> <p:input port="source"> <p:empty/> </p:input> <p:input port="stylesheet"> <p:document href="http://andrewjwelch.com/code/xslt/csv/csv-to-xml_v2.xslt"/> </p:input> <!-- note that relative paths are resolved against the stylesheet's base URI --> <p:with-param name="pathToCSV" select="$pathToCSV"/> </p:xslt> </p:declare-step>
In this solution, the stylesheet loads the CSV file. I think it should be straightforward to modify the pipeline/stylesheet so that the pipeline itself loads the CSV file (using p:data or p:http-request) and passes the c:data-wrapped representation to the stylesheet.
Receive an XML document to save.
Check the database to see if the document exists.
If the document exists, update the document.
If the document does not exists, add the document.
Editorial note: TBD | 20120419 |
This use case is TBD.. There is no specific language in XProc: An XML Pipeline Language to suggest communication with a database engine. Certainly, there are no references to SQL or other database languages. Examples available from Norm, Vojtech. Requirement for new atomic steps for database access? This is an open issue. |
Need an example showing a step accessing a DB.
Receive an XML document to format.
If the document is XHTML, apply a theme via XSLT and serialize as HTML.
If the document is XSL-FO, apply an XSL FO processor to produce PDF.
Otherwise, serialize the document as XML.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Vojtech to write pipeline to demonstrate. |
This one is a little tricky as XProc does not support specifying serialization options on output ports dynamically. Because of that, it is not possible to write a pipeline with a single "result" output port that uses different serialization options that depend on the (dynamic) data content type. One solution is to have multiple output ports ("result-html", "result-xml", ...) with different serialization options, but that's probably silly and too inconvenient to work with (plus it does not work with non-XML data). Another solution is not to have any output ports at all and use p:store instead. The drawback of this is that p:store writes the data to an external location and therefore breaks the pipeline flow, but you can have multiple p:store steps with different serialization options, or you can even set the serialization options on p:store dynamically. Because the p:xsl-formatter renders the XSL-FO document to an external location, I went for the p:store solution:
<p:declare-step xmlns:html="http://www.w3.org/1999/xhtml" xmlns:fo="http://www.w3.org/1999/XSL/Format"> <p:input port="source"/> <p:option name="output" required="true"/> <p:choose> <p:when test="/html:html"> <!-- apply a theme using XSLT and serialize as HTML --> <p:xslt> <p:input port="stylesheet"> <p:document href="style.xsl"/> </p:input> <p:input port="parameters"> <p:empty/> </p:input> </p:xslt> <p:store method="html"> <p:with-option name="href" select="$output"/> </p:store> </p:when> <p:when test="/fo:root"> <!-- apply an XSL-FO processor--> <p:xsl-formatter> <p:with-option name="href" select="$output"/> <p:input port="parameters"> <p:empty/> </p:input> </p:xsl-formatter> </p:when> <p:otherwise> <!-- serialize as XML --> <p:store> <p:with-option name="href" select="$output"/> </p:store> </p:otherwise> </p:choose> </p:declare-step>
Mobile example:
Receive an XML document to format.
If the configuration is "desktop browser", apply desktop XSLT and serialize as HTML.
If the configuration is "mobile browser", apply mobile XSLT and serialize as XHTML.
News feed example:
Receive an XML document in Atom format.
If the configuration is "RSS 1.0", apply "Atom to RSS 1.0" XSLT.
If the configuration is "RSS 2.0", apply "Atom to RSS 2.0" XSLT.
Serialize the document as XML.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Vojtech to write pipeline to demonstrate. |
The newsfeed example (the mobile example is just a combination of the newsfeed example and 5.21):
<p:pipeline > <p:option name="configuration" required="true"/> <p:choose> <p:when test="$configuration='RSS 1.0'"> <p:xslt> <p:input port="stylesheet"> <p:document href="atom-to-rss-10.xsl"/> </p:input> </p:xslt> </p:when> <p:when test="$configuration='RSS 2.0'"> <p:xslt> <p:input port="stylesheet"> <p:document href="atom-to-rss-20.xsl"/> </p:input> </p:xslt> </p:when> </p:choose> </p:pipeline>
Receive an XML-RPC request.
Validate the XML-RPC request with a RelaxNG schema.
Dispatch to different sub-pipelines depending on the content of /methodCall/methodName.
Format the sub-pipeline response to XML-RPC format via XSLT.
Validate the XML-RPC response with an W3C XML Schema.
Return the XML-RPC response.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Duplicates 5.12 A Simple Transformation Service and 5.17 An AJAX Server. Some conditional sub-pipeline. Vojtech to write pipeline to demonstrate. |
This pipeline takes an XML-RPC request document and invokes a method (an XProc pipeline) based on the value of /methodCall/methodName. Because there is no standard p:eval step for dynamic evaluation of XProc pipelines, we have to use p:choose which lists all possible pipelines statically.
The pipeline below is rather simplistic in the sense that it does not try to interpret XMLRPC's "int", "string", "struct", etc. elements. The input data is passed in the original XMLRPC format to the invoked pipelines, and likewise, the pipelines are expected to represent their results in XMLRPC format.
<p:pipeline xmlns:p="http://www.w3.org/ns/xproc" version="1.0" xmlns:ex="http://www.example.org"> <!-- Defines various 'method' pipelines in the "http://www.example.org" namespace. Pipeline interface contract: - a single (primary) input port - a single (primary output port) - expect a single <params> input document - produce a single <params> or <fault> output document --> <p:import href="method-library.xpl"/> <p:pipeline type="ex:invoke-method"> <p:variable name="method" select="/methodCall/methodName"/> <p:identity> <p:input port="source" select="/methodCall/params"/> </p:identity> <p:try> <p:group> <!-- Note: the p:choose could be replaced with a single call to p:eval if we had such a step --> <p:choose> <p:when test="$method = 'method1'"> <ex:method1/> </p:when> <p:when test="$method = 'method2'"> <ex:method2/> </p:when> <p:otherwise> <p:template name="error-message"> <p:input port="template"> <p:inline> <message>Unsupported method: {$method}</message> </p:inline> </p:input> <p:with-param name="method" select="$method"/> </p:template> <p:error code="ex:error"> <p:input port="source"> <p:pipe step="error-message" port="result"/> </p:input> </p:error> </p:otherwise> </p:choose> </p:group> <p:catch name="catch"> <p:template> <p:input port="source"> <p:pipe step="catch" port="error"/> </p:input> <p:input port="template"> <p:inline> <fault> <value> <struct> <member> <name>faultCode</name> <value><int>-1</int></value> </member> <member> <name>faultString</name> <value><string>{string(/*)}</string></value> </member> </struct> </value> </fault> </p:inline> </p:input> </p:template> </p:catch> </p:try> <p:wrap-sequence wrapper="methodResponse"/> </p:pipeline> <p:validate-with-relax-ng> <p:input port="schema"> <p:data href="xmlrpc.rnc" content-type="text/plain"/> </p:input> </p:validate-with-relax-ng> <ex:invoke-method/> <p:validate-with-xml-schema> <p:input port="schema"> <p:document href="xmlrpc-response.xsd"/> </p:input> </p:validate-with-xml-schema> </p:pipeline>
Import example:
Read a list of source documents.
For each document in the list:
Validate the document.
Call a sub-pipeline to insert content into a relational or XML database.
Ingestion example:
Receive a directory name.
Produce a list of files in the directory as an XML document.
For each element representing a file:
Create an iTQL query using XSLT.
Query the repository to check if the file has been uploaded.
Upload if necessary.
Inspect the file to check the metadata type.
Transform the document with XSLT.
Make a SOAP call to ingest the document.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. A db Access Requirement emerges from combining 5.20 Update/Insert Document in Database with 5.24 Database Import/Ingestion. Someone to write up a proposal. MM to create related topics. |
Relates to F.1.1 What Flows?
Call a SOAP service with metadata format as a parameter.
Create an iTQL query with XSLT.
Query a repository for the XML document.
Load a list of XSLT transformations from a configuration.
Iteratively execute the XSLT transformations.
Serialize the result to XML.
Editorial note: REFACTOR | |
This use case to be refactored into 5.12 A Simple Transformation Service |
Relates to F.1.1 What Flows? and 5.19 Read/Write Non-XML File and 5.26 Non-XML Document Production
A non-XML document is fed into the process.
That input is converted into a well-formed XML document.
A table of contents is extracted.
Pagination is performed.
Each page is transformed into some output language.
Read a non-XML document.
Transform.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. See new steps. |
Select a MathML content element.
For that element, apply a computation (e.g. compute the kernel of a matrix).
Replace the input MathML with the output of the computation.
Editorial note: TBD | |
This use case is [not] satisfied. Alex to write. See instead: refactoring at end of 5.7 Extracting MathML |
This document provides a test scenario that will be used to create validation management scripts using a range of existing techniques, including those used for program compilation, etc.
The steps required to validate our sample document are:
Use ISO 19757-4 Namespace-based Validation Dispatching Language (NVDL) to split out the parts of the document that are encoded using HTML, SVG and MathML from the bulk of the document, whose tags are defined using a user-defined set of markup tags.
Validate the HTML elements and attributes using the HTML 4.0 DTD (W3C XML DTD).
Use a set of Schematron rules stored in check-metadata.xml to ensure that the metadata of the HTML elements defined using Dublin Core semantics conform to the information in the document about the document's title and subtitle, author, encoding type, etc.
Validate the SVG components of the file using the standard W3C schema provided in the SVG 1.2 specification.
Use the Schematron rules defined in SVG-subset.xml to ensure that the SVG file only uses those features of SVG that are valid for the particular SVG viewer available to the system.
Validate the MathML components using the latest version of the MathML schema (defined in RELAX-NG) to ensure that all maths fragments are valid. The schema will make use the datatype definitions in check-maths.xml to validate the contents of specific elements.
Use MathML-SVG.xslt to transform the MathML segments to displayable SVG and replace each MathML fragment with its SVG equivalent.
Use the ISO 19757-8 Document Schema Renaming Language (DSRL) definitions in convert-mynames.xml to convert the tags in the local nameset to the form that can be used to validate the remaining part of the document using docbook.dtd.
Use the IS0 19757-7 Character Repertoire Definition Language (CRDL) rules defined in mycharacter-checks.xml to validate that the correct character sets have been used for text identified as being Greek and Cyrillic.
Convert the Docbook tags to HTML so that they can be displayed in a web browser using the docbook-html.xslt transformation rules.
Each validation script should allow the four streams produced by step 1 to be run in parallel without requiring the other validations to be carried out if there is an error in another stream. This means that steps 2 and 3 should be carried out in parallel to steps 4 and 5, and/or steps 6 and 7 and/or steps 8 and 9. After completion of step 10 the HTML (both streams), and SVG (both streams) should be recombined to produce a single stream that can fed to a web browser. The flow is illustrated in the following diagram:
Editorial note: TBD | |
This use case is not satisfied. Proposed new step F.5.7.1 pxp:nvdl relates to this Use Case. Should we allow steps to have varying # of outputs. Norm to write up an explanation of related problems. p:manifold? Henry doubts it's worth the effort. |
Relates to F.5.11 Iteration
Relates to F.4.11 XPath
Running XSLT on a very large document isn't typically practical. In these cases, it is often the case that a particular element, that may be repeated over-and-over again, needs to be transformed. Conceptually, a pipeline could limit the transformation to a subtree by:
Limiting the transform to a subtree of the document identified by an XPath.
For each subtree, cache the subtree and build a whole document with the identified element as the document element and then run a transform to replace that subtree in the original document.
For any non-matches, the document remains the same and "streams" around the transform.
This allows the transform and the tree building to be limited to a small subtree and the rest of the process to stream. As such, an arbitrarily large document can be processed in a bounded amount of memory.
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write up. Streaming is implementation defined. Merge with 5.30 |
Relates to F.4.11 XPath
For a particular website, every XHTML document needs to have navigation elements added to the document. The navigation is static text that surrounds the body of the document. This navigation is added by:
Matching the head and body elements using a XPath expression that can be streamed.
Inserting a stub for a transformation for including the style and surrounding navigation of the site.
For each of the stubs, transformations insert the markup using a subtree expansion that allows the rest of the document to stream.
In the end, the pipeline allows arbitrarily large XHTML document to be processed with a near-constant cost.
(source: Alex Milowski)
Editorial note: TBD | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write up. This is an example of previous use case. Streaming. |
Relates to F.3.5 Fall-back Mechanism
A step in a pipeline produces multiple output documents. In XSLT 2.0, this is a standard feature of all XSLT 2.0 processors. In XSLT 1.0, this is not standard.
A pipeline author wants to write a pipeline that, at compile-time, the implementation chooses XSLT 2.0 when possible and degrades to XSLT 1.0 when XSLT 2.0 is not supported.
In the case of XSLT 1.0, the step will use XSLT extensions to support the multiple output documents--which again may fail. Fortunately, the XSLT 1.0 transformation can be written to test for this.
(source: Alex Milowski)
Editorial note: UNSATISFIED | |
This use case is [not] satisfied, as exemplified in TBD. Try/catch no good. Vojtech & Norm. XSLT 1.0 processor does not handle. |
The pipeline below does the following:
Checks if XSLT 2.0 is supported.
If XSLT 2.0 is available, it applies an XSLT 2.0 stylesheet to the input XML document. The stylesheet uses xsl:result-document to generate secondary output documents.
If XSLT 2.0 is not available, it applies an XSLT 1.0 stylesheet. The stylesheet uses either the exsl:document or result:write extension (whichever is available) to generate secondary output documents.
The pipeline has two output ports: the "result" output port for the primary result of the XSLT transformation, and "secondary" for the secondary documents.
...the pipeline almost works. The problem is with the XSLT 1.0 transformation, because the secondary documents do not appear on the "secondary" step of the p:xslt step. This is actually a requirement made by the XProc specification: "If XSLT 1.0 is used, an empty sequence of documents must appear on the secondary port." The exact behavior of exsl:document and result:write in the XProc context is implementation-defined; in most cases, the generated documents will be simply written to the specified external location.
<p:pipeline name="main" xmlns:ex="http://www.example.org" > <p:output port="secondary" sequence="true"> <p:pipe step="process" port="secondary"/> </p:output> <p:declare-step type="ex:is-xslt20-supported"> <p:output port="result"/> <p:try> <p:group> <p:xslt version="2.0"> <p:input port="source"><p:inline><foo/></p:inline></p:input> <p:input port="stylesheet"> <p:inline> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template match="/"> <true><xsl:value-of select="1 to 2"/></true> </xsl:template> </xsl:stylesheet> </p:inline> </p:input> <p:input port="parameters"><p:empty/></p:input> </p:xslt> </p:group> <p:catch> <p:identity> <p:input port="source"> <p:inline><false/></p:inline> </p:input> </p:identity> </p:catch> </p:try> </p:declare-step> <ex:is-xslt20-supported/> <p:choose name="process"> <p:when test="/true"> <p:output port="result" primary="true"/> <p:output port="secondary" sequence="true"> <p:pipe step="xslt" port="secondary"/> </p:output> <p:xslt name="xslt" version="2.0"> <p:input port="source"><p:pipe step="main" port="source"/></p:input> <p:input port="stylesheet"> <p:inline> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:template name="generate-secondary-content"> <doc>Hello world!</doc> </xsl:template> <xsl:template match="/"> <xsl:result-document href="foo.xml"> <xsl:call-template name="generate-secondary-content"/> </xsl:result-document> <ignored/> </xsl:template> </xsl:stylesheet> </p:inline> </p:input> </p:xslt> </p:when> <p:otherwise> <p:output port="result" primary="true"/> <p:output port="secondary" sequence="true"> <p:pipe step="xslt" port="secondary"/> </p:output> <p:xslt name="xslt" version="1.0"> <p:input port="source"><p:pipe step="main" port="source"/></p:input> <p:input port="stylesheet"> <p:inline> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exsl="http://exslt.org/common" xmlns:redirect="http://xml.apache.org/xalan/redirect" extension-element-prefixes="exsl redirect" version="1.0"> <xsl:template name="generate-secondary-content"> <doc>Hello world!</doc> </xsl:template> <xsl:template match="/"> <exsl:document href="foo.xml"> <xsl:call-template name="generate-secondary-content"/> <xsl:fallback> <redirect:write file="foo.xml"> <xsl:call-template name="generate-secondary-content"/> </redirect:write> </xsl:fallback> </exsl:document> <ignored/> </xsl:template> </xsl:stylesheet> </p:inline> </p:input> </p:xslt> </p:otherwise> </p:choose> </p:pipeline>
Relates to F.3.5 Fall-back Mechanism
As the final step in a pipeline, XQuery is required to be run. If the XQuery step is not available, the compilation of the pipeline needs to fail. Here the pipeline author has chosen that the pipeline must not run if XQuery is not available.
(source: Alex Milowski)
Editorial note: SATISFIED | |
This use case is [not] satisfied, as exemplified in TBD. Alex to write. Step available. Fails at run time rather than compile time. Obviously the pipeline should test for required steps before it starts running. Is there a step that tests for all of the required and optional steps and reports what's what? |
Editorial note: Identifiers | |
Identifier: "In metadata, an identifier is a language-independent label, sign or token that uniquely identifies an object within an identification scheme. The suffix identifier is also used as a representation term when naming a data element. [...] In computer science, identifiers (IDs) are lexical tokens that name entities. The concept is analogous to that of a "name." Identifiers are used extensively in virtually all information processing systems. Naming entities makes it possible to refer to them, which is essential for any kind of symbolic processing. [...] In computer languages, identifiers are tokens (also called symbols) which name language entities. Some of the kinds of entities an identifier might denote include variables, types, labels, subroutines, and packages. In most languages, some character sequences have the lexical form of an identifier but are known as keywords." -- WikiPedia 10 Apr 2012 |
The following are listed in XProc: An XML Pipeline Language. Should the list broaden?
The following are listed but not referenced in XProc 1.0.
The following are not listed in XProc: An XML Pipeline Language
The following are not listed in XProc: An XML Pipeline Language.
The following are not listed in XProc: An XML Pipeline Language.
The following are not listed in XProc: An XML Pipeline Language
The following are other XML-related specifications for which some form of processing support.
The following are not listed in XProc: An XML Pipeline Language.
The following are listed but not referenced in XProc: An XML Pipeline Language.
The following are not listed in XProc: An XML Pipeline Language.
The following are not listed in XProc: An XML Pipeline Language.
One example of a B2B Transaction Language for which a nominal XProc profile could be developed.
The following are not listed in XProc: An XML Pipeline Language.
The following are not listed in XProc: An XML Pipeline Language.
The following are Semantic Web-related specifications for which some form of processing support.
The following are not listed in XProc: An XML Pipeline Language.
Editorial note | |
An editor offers, without prejudice, this set of specifications for consideration, not only as a potential step, but also as a response to the need for a F.2 Resource Management. There are other resource management protocols extant. |
The following are not listed in XProc: An XML Pipeline Language.
The following are listed in XProc: An XML Pipeline Language but not normatively.
A list of reference processors?
The following are not listed in XProc: An XML Pipeline Language but not normatively.
The following are taken from the XProc Candidate Issues Document as determined at the working group's October 31 f2f (minutes). Issue numbers refer to numbers given in the issues document. The editors intend to expand these notes and migrate them to later sections as and when appropriate.
Issue 001: extend our current p:template in order to have some higher level construct without going into FULL XSLT
Relates to F.4.3.9 p:template
Issue 004: allow attribute value templates within xproc elements
Issue 006: harmonize p:data and p:load
Relates to F.4.3.1 p:data
Sections 2-5 of the V1 XML Processing Model Requirements and Use Cases are included herein, annotated for review of requirements and use cases that have been left unsatisfied in V1. The editors hope to record which requirements and use cases have been satisfied by XProc: An XML Pipeline Language, and to note which have not been satisfied. This should assist the working group in determining which requirements and use cases should be addressed in XProc V.next.
To aid navigation, the requirements can be mapped to the use cases of this section as follows:
Editorial note | |
The above table is known to be incomplete and will be completed in a later draft. We note that many Use Cases are not associated with a Requirement, and welcome suggestions as to the correspondence to Requirements for Use Cases 5.4-8, 5.12-14, 5.17-21, 5.22, 5.25-26 and 5.28. |
Here is my first cut of the step inventory categorization for my action item. I've take this from information that was sent to me, source code, and documentation online [1]. I did not include the general categories we had on the wiki [2]. Those categories were "Sorting", "Validation with Error", "Map-reduce", "Iterate until condition", "Dynamic Pipeline Execution", "Long-form Viewport", and "e-mail." -- AM.
Second cut. Completed list. Anotated. Minor reorganization coming. -- MM.
These lists will be anotated and re-formatted later. -- MM.
5.9
p:library5.16
p:documentation5.17
p:pipeinfo4.1
p:pipeline5.8
p:declare-step5.10
p:import5.11
p:pipeRelates to F.4.3.5 p:pipe
5.12
p:inline5.13
p:document5.14
p:dataRelates to F.4.3.1 p:data
5.15
p:empty4.2
p:for-each4.3
p:viewportRelates to F.4.3.12 p:viewport
4.4
p:choose4.4.1
p:xpath-context4.4.2
p:when4.4.3
p:otherwise4.5
p:group4.6
p:tryRelates to F.4.3.10 p:try
5.1
p:inputRelates to F.4.3.2 p:input
5.2
p:iteration-sourceRelates to F.5.11.2 p:iteration-source
5.3
p:viewport-source5.4
p:output5.6
p:serializationRelates to F.4.3.6 p:serialization
5.7.1
p:variableRelates to F.4.3.11 p:variable
5.7.2
p:optionRelates to F.4.3.4 p:option
5.7.3
p:with-option5.7.4
p:with-param7.1.1
p:add-attribute7.1.2
p:add-xml-base7.1.5
p:delete7.1.12
p:insert7.1.13
p:label-elements7.1.15
p:make-absolute-uris- - -
cx:namespace-delete7.1.16
p:namespace-rename7.1.19
p:rename7.1.20
p:replace7.1.21
p:set-attributes7.1.25
p:string-replaceRelates to F.4.3.8 p:string-replace
7.1.27
p:unwrap7.1.28
p:wrap7.1.30
p:xinclude7.1.31
p:xslt- - -
p:templateRelates to F.4.3.9 p:template
7.2.9
p:xqueryRelates to F.4.3.9 p:template
- - -
ml:adhoc-query- - -
ml:insert-document- - -
ml:invoke-module7.2.4
p:validate-with-relax-ng7.2.5
p:validate-with-schematron7.2.6
p:validate-with-xml-schema- - -
cx:nvdlRelates to F.5.7.1 pxp:nvdl
7.1.3
p:compare7.1.4
p:count7.1.11
p:identity7.1.9
p:filter7.2.2
p:hash7.2.10
p:xsl-formatter- - --
cx:delta-xml- - --
cxu:pretty-print- - --
cxu:css-formatter- - --
emx:get-base-uriRelates to F.5.3 Directory Operations
- - -
cxf:copy- - -
cxf:delete7.1.6
p:directory-list- - -
cxf:head- - -
cxf:info- - -
cxf:mkdir- - -
cxf:move- - -
cxf:tail- - -
cxf:tempfile- - -
cxf:touch- - -
cx:unzip- - -
cx:zip7.1.10
p:http-request7.1.14
p:loadRelates to F.4.3.3 p:load
7.1.22
p:sinkRelates to F.4.14 Required Primary Port
7.1.24
p:storeRelates to F.4.3.7 p:store
- - --
cx:uri-info- - --
emx:fetch7.1.8
p:escape-markup7.1.26
p:unescape-markup7.2.7
p:www-form-urldecode7.2.8
p:www-form-urlencodeRelates to F.5.5 Cookie Operations and F.5.8 Messaging Operation
7.2.3
p:uuid- - -
cx:get-cookies- - -
cx:set-cookies- - -
cx:send-mail7.1.18
p:parametersRelates to F.4.4 Parameter Rules
- - -
p:in-scope-namesRelates to ???
Relates to F.5.2 OS Operations and Environment
- - -
cx:java-properties- - -
cxo:info- - -
cxo:cwd- - -
cxo:env7.1.7
p:error- - -
cx:evalRelates to F.5.6.2 cx:eval
- - -
emx:eval5.5
p:logRelates to F.5.12 Debugging Operations
- - -
cx:messageRelates to F.5.12.4 dbxml:message
- - -
emc:message- - -
cx:report-errorsEntirely speculative. Relates to F.5.12 Debugging Operations
- - -
dbxml:breakpoint- - -
dbxml:comment- - -
dbxml:debug- - -
cx:message- - -
dbxml:traceRelates to the Principle: 3.11 Control of Flow and Errors and B.14 Candidate Non-XML Data Format Specifications
Should we open up the pipeline architecture to allow more than XML documents to flow through it? With respect to other media types (see below for some possibilities), there are a number of possibilities in general:
Allow staticly, only at the whole-pipeline margins
Allow staticly, at the step level (i.e. step signatures include media types for all inputs and outputs)
Reject any pipeline where the output media type doesn't match the media type of the input to which its connected
and any non-XML output must immediately be converted to XML
and foo--foo connections are allowed
And auto-shim for every possible pair
And auto-shim only for other-XML and XML-other, so other1→other2 requires two shims
Allow dynamically (e.g. from p:http-request)
With a static declaration of the alternatives you expect, and anything else is an error
With a pipeline fallback if all else fails, getting <c:data media-type=...>...</c:data>
Any shim-to-XML can be (?) configured wrt the target vocabulary (how?) We could identify shim tactics with QNames, similar to the way serialization methods are done in XProc already
Allow non-XML (text/binary) to flow through a pipeline. The implementation would hex-encode non-XML whenever XML was expected This would, for example, allow xsl-formatter to produce the output on a port that could then be serialized by the pipeline.
Allow unbounded number of outputs from some steps? MZ says we need this for the NVDL
use case [cross-reference needed]. Markup pipeline allowed this, subsequent steps need
to access by name, where default naming is with the integers. . . p:pack
could have more than two inputs, so you could do column-major packing . . .
Relates to F.5.7.1 pxp:nvdl.
Relates to B.14 Candidate Non-XML Data Format Specifications and B.6 Candidate Specifications: HTML and B.7 Candidate Specifications: CSS and B.11 Candidate Specifications: Semantic Web
From Vojtech Toman: In my XML Prague paper "XProc: Beyond application/xml" I looked at one possible way of extending XProc to support non-XML media types. The basic idea is that XProc steps declare which media types they accept on their input ports and which media types they produce on their output ports. If it happens that data with a media type A (for instance, text/csv) arrives on an input port that expects media type B (for instance, application/xml), the XProc processor will try to convert the data to the expected media type. What kinds of conversions are supported and what do they look like is not covered in the paper, because that is an issue on its own. I was focusing just on the implications of this to the XProc processing model (which, it turns out, are actually not that big).
You can find the conference proceedings here (my article is on page 27): http://www.xmlprague.cz/2012/files/xmlprague-2012-proceedings.pdf
Support a more event-driven processing model?
Can we suspend a pipeline waiting for something to happen? Some examples; wait for HTTP POST from github (notifications), jms queue listener, tcp socket listener
Can we dump a partially evaluated pipeline instance for subsequent resumption?
Does this relate to the proposed step F.5.6.2 cx:eval?
Related-but-different, with pipeline-internal events, as it were Philip Fennel has done some work on XProc+SMIL. [SMIL]
Does this relate to F.3.1 XML Choreography?
Relates to B.15 Candidate Specifications: Web Distributed Authoring and Versioning (WebDAV)
Relates to F.5.3 Directory Operations, and F.5.2 OS Operations
Local store and retrieve. Build it, store it, get it back later, all under your control
On-demand construction. Associate a pipeline with a URI into the manager, which will run if the URI is not there. Or not current -- you need to know what all the dependencies are, and check them
Give URIs to step outputs. So you could point xinclude at a step output. Would you have to include a local catalog facility to make this really useful?
Cache intermediate URIs
Refactoring:
Local store and retrieve is facilitated by F.5.3 Directory Operations
Assigning output to a URI can be accomodated by local/remote store and retrieve with http: and file: methods.
XInclude relates markup to resources, not ports. In my understanding, using XInclude to point at a step output port via a contrived URI that is fronting for an application-defined 'resource manager' is not coherent. Steps have input and output ports. Some steps are capable of locale/remote storage and retrieval or resources. Resources have URIs.
The canonical resource manager use case, to my mind, is the XInclude case. Consider this slightly contrived example.
<doc>Today's weather is <xi:include href="todays-weather.xml"/> </doc> pipeline.xpl: <p:pipeline> ... <ext:get-weather-based-on-params-or-locale-or-whatever base-uri="todays-weather.xml"/> <p:xinclude> <p:input port="source"> <p:document href="input.xml"/> </p:input> </p:xinclude> ... </p:pipeline>
The idea is that the get-weather... step produces a document with the appropriate base URI and then when XInclude goes off to get that document, the pipeline provides the document generated by some other step in the pipeline.
It's possible, for any given case, to imagine ways to rewrite the pipeline, but the general case remains: processing some documents will appeal to URIs and it would be useful to be able to generate the documents that should satisfy those URIs in other steps in the pipeline (consider synthesized stylesheets and schemas, for example).
Relates to F.4.4 Parameter Rules and F.4.5 Choose-style binding and F.5.6.2 cx:eval
Run a pipeline whose XML representation is input
Dynamic evaluation. See F.5.6.2 cx:eval
Dynamic attribute values. Meaning?
Support for 'depends-on' (or some mechanism for asserting dependencies that are not manifest in the data flow)
Steps with varying numbers of inputs/outputs with dynamic names.
On the face of it, the need is obvious. Dynamically defined pipelines that conceptually resemble manifolds for processing row-/column-major data. Most scripting languages can accomodate themselves to dynamically changing data structures, so why not XProc? It turns out that there are performance penalties associated with late-binding. First of all, there is a front-end cost associated with constructing the logical model of each manifold; that is why it pays to design your most commonly used manifolds carefully, test them rigorously, and compile them statically, to ensure optimal performance. Dynamic computation of manifold structure, and dynamic composition of port names actually impedes streaming pipeline execution by shifting the burden into the execution layer, where it is can be more fragile because various resources may not have been pre-arranged. -- MM
Should we give access to MemCache and elasticache?
Already possible from an extension step [reference needed], do we need more?
Already possible using p:http-request?
Should we have a way of accessing environment information more generally?
Relates to and F.5.3 Directory Operations and F.5.2 OS Operations and F.3.4 Debugging
The following is a list of steps and functions that generate environment information.
p:base-uri
pos:env
pos:cwd
pos:info
pxf:info
p:iteration-position
p:iteration-size
p:pipeinfo
p:resolve-uri
p:step-available
p:value-available
p:system-property
p:episode
p:language
p:product-name
p:product-version
p:vendor
p:vendor-uri
p:version
p:xpath-version
p:psvi-supported
Data types for options and parameters
Also, as I'm binding certain typed values to options (e.g. pulling a start time off the query parameters), I'd really like an easy way to say: "This option is typed as xs:dateTime. If the value does not cast properly, run this other part of the pipeline." One simple way we could accomplish this is to allow type errors within a certain portion of the pipeline to be caught and processed somehow. -- AM
Suggested by MZ:
<?xml version="1.0" encoding="UTF-8"?> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" exclude-inline-prefixes="c xs" xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0" xpath-version="2.0"> <p:output port="result"/> <p:group> <p:choose> <p:variable name="a" select="'2012-02-30'"/> <p:when test="$a castable as xs:date"> <p:identity> <p:input port="source"> <p:inline> <the-variable-is-castable-as-date/> </p:inline> </p:input> </p:identity> </p:when> <p:otherwise> <p:identity> <p:input port="source"> <p:inline> <the-variable-is-NOT-castable-as-date/> </p:inline> </p:input> </p:identity> </p:otherwise> </p:choose> </p:group> </p:declare-step>
Hmm... maybe. I had thought of more of a try/catch operation that would catch type errors. Using p:choose a lot can make simple pipelines very complicated. -- AM
My initial thought was that we could state all the type pre-conditions and they then catch only executes when the typing fails. This would be a lot less complicated that trying to write all that into a test expression. Of course, not everything can be expressed as a simple type cast check. For example, range value checks would still need to be expressions. -- AM
The orchestration of XSLT/XQuery/.... XProc as the controller. Support for playing a useful standardised role in XRX. LQ.
Can we add some kind of authentication management which is out-of-band but available?
Does this need to be in the language, or can it be implementation-defined? If it was in the language how would steps get at it?
Presumably authentication can and should happen out-of-band. Perhaps in a layer that surrounds the processor and/or the data store.
Does this relate to F.5.8.1 cx:send-mail or to B.13 Candidate Specification: Mail Messages?
Relates to Principal: 3.2 Platform Neutral.
Relates to Requirement: 4.9 Allow Optimizations
Do we need support for clustering?
"Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters. Clustering is a main task of explorative data mining, and a common technique for statistical data analysis used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics." -- WikiPedia
Relates to the Principle: 3.11 Control of Flow and Errors
How to make xproc development more amenable to debugging?
Relates to F.5.12 Debugging Operations
pos:env
pos:cwd
p:pipeinfo
pos:info
pxf:info
cx:eval
p:error
p:log
cx:message
Relates to the Principles: 3.6 Address Practical Interoperability and 3.11 Control of Flow and Errors
Relates to Requirements: 4.7 Error Handling and Fall-back
How to make xproc development more amenable to error recovery?
Relates to the Principles: 3.6 Address Practical Interoperability and 3.11 Control of Flow and Errors
Relates to Requirements: 4.7 Error Handling and Fall-back and 4.2 Allow Defining New Components and Steps
Presumably we require a test suite. Luckily, one exists. Let's set our goal and immediately claim victory.
Relates to Principle: 3.2 Platform Neutral, and proposed Steps: F.5.2 OS Operations
Make it easier to create cross platform pipelines e.g. file.separator in file paths
Add a Note or another spec for documentation conventions. Parallel to Javadoc? add an
xml:lang
attribute to p:documentaton
and recommend its use.
See https://community.emc.com/docs/DOC-8657 for an example.
<p:documentation> <div><head>This is my documentation</head> <p>I can explain my pipeline here.</p> </div> [...] <div><head>Extract metadata from image files</head> <p>I can explain how I extract metadata from various image files. I probably have some details that need explanation.</p> </div>
Can we simplify the markup? Is there a compact sytntax alternative?
The following sub sections represent steps whose Usability is deemed to be affected by superfluous Verbosity, based upon comments gathered in mailing lists and elsewhere. The details need to be filled in with a description of the problem, a suggested revision and a justification.
<p:input port="source" href="…"/> <p:input port="source" step="name" step-port="secondary"/> <p:input port="source" step="name"/> <p:input select= .... />
Relates to F.1.1 What Flows?
An option on p:store
to save decoded/binary data.
<p:store ... />
Empty source on p:template
. If you're fabricating from whole cloth, you
have to waste space with a pointless <foo/> What would be the downside of having
the empty sequence as the default input in most/all cases? AM suggests that we allow
this on a step-by-step basis
<p:template ... />
Relates to Principle: 3.3 Small and Simple
p:group
within p:try
-- Could we remove this requirement?
Is this a case of making life easier for implementors which confuses users? Or is it
actually simpler to have the group/catch as the only top-level children?
p:variable templates
Should we allow p:variable anywhere in groups?
Adding a p:variable requires adding p:group…feels odd
Allow variables to be visible in nested pipelines
Explanation: Allow p:xpath-context/@select? Library-level (“global”) variables? And/or pipeline-level variables that would be visible also in nested pipelines? Not really a variable, but a p:option or p:parameter that’s visible across multiple pipelines.
Example: A directory path shared by several steps that the pipeline user might want to override. A simple mechanism for constructing XML fragments using local context. (A single template? XQuery style curly braces?)
Here’s a constructive example... Make p:rename/@new-name optional, so that it’s possible to move elements from namespace X that match a certain condition to namespace Y. This is currently quite difficult to do. Could you achieve this using @use-when?
Now that we have a bunch of real pipelines, can we simplify the rules by limiting the allowed usage patterns? At least, get rid of the necessity for p:empty as the parameter input [when it's now required: someone to fill in]
Data types for options and parameters
Arbitrary data model fragments for parameters/options/variables
Explore using maps to simplify the parameters story
Here's the hard case that has to be handled:
<p:pipeline> <p:xslt> <p:input port="stylesheet"> <p:document href="docbook.xsl"/> </p:input> </p:xslt> </p:pipeline>
Pass parameters to the pipeline and have those parameters available inside the stylesheet without enumerating all of them in the pipeline. How do I easily create a c:param-set for a hypothetical 'parameters' option without invoking even more magic than we currently have?
Suppose you have a pipeline with a step X, and depending on some dynamic condition, you want X to process documents (or entire sequences of documents) A, B, or C. Currently, the only way to do this is to use a p:choose a to duplicate the step X with different input bindings in each branch. This not only looks silly, but it is painful to write.
One solution to this would be a choose-style binding (a wrapper around the existing bindings) that would dynamically select the bindings to use.
An example would help.
Does this relate to F.2.2 Dynamic pipeline execution
Relates to F.1.1 What Flows?
Can we remove the restriction on variables/options/params being bound only to strings? What would be allowed:
binaries - This would allow not only the possibility of binary resource files, but all would enable the ability to pass maps, which is where I think the real value-add comes in.
sequences - Not just for strings, but for nodes and binaries as well.
Relates to C.2 Issue 004: attribute value templates
An example would help.
Lots of workarounds, but shouldn't need them. Attribute-value templates would solve this.
An example would help.
Does this relate to F.2.2 Dynamic pipeline execution
AM to complete. Simplify the task of passing "optional options" through a pipeline? Something that works from the command line but not internally to a library step???
An example would help.
Relates to 4.4 Allow Pipeline Composition
The existing magic is not consistent or easily understandable
An example would help.
XPath Required?
XPath 2.0 only?
Custom XPath functions (ala xsl:function) using “simplified XProc steps” (whatever that means)
A way of re-using pipelines. Or allowing pipelines to be imported into XQuery or XSLT
Some mechanism for loading sets of documents. XProc, as currently defined, feels somewhat awkward:
consider a xyz:documents element which roughly emulates apache ant filesets
consider reusable file path structures
consider providing conventions for making xproc scripts more cross platform e.g. file seperators
<p:document href="/path/to/directory" include="*.xml"/> <p:data href="/path/to/directory" include="*.xml"/>
Does this relate to F.2 Resource Management?
Streaming is inhibited by the use of p:try/p:catch to capture validation errors (because p:try/p:catch mandates buffering).
So, pipelines written to take advantage of streaming processors will want to
avoid p:try/p:catch
. That should be noted. What are other strategies that
will work in a streaming context? Does eval
do the job?-- MM
Allow p:for-each
to generate the result of each step in an unordered way
(with a simple attribute ordered
=""true|false"", the default
being true
).
Does removing the "in order" from 4.2 p:for-each "For each declared output, the processor collects all the documents that are produced for that output from all the iterations, in order, into a sequence." solve the problem?
Relates to Use Case: 5.29 Large-Document Subtree Iteration
Relates to Use Case: 5.30 Adding Navigation to an Arbitrarily Large Document
Editorial note: Candidate Use Case | 20120405 |
Required Primary Port |
(source: Alex Milowski)
I find myself always frustrated when I have to use steps that have no primary output port defined. I usually have to do some sort of "fixup" in the pipeline just to make what I believe should be the minimum. I'm often using p:store or ml:insert-document (marklogic) and, while there is an output, it just isn't defined as primary. While you can say that is just a bad step definition, I think it is more than that.
I think it would have been better to say that if your step produces any output, one of the ports must be defined as primary. This would also avoid pipeline re-arrangements after edits due to unconnected output ports.
For example, consider these two snippets, which are not interchangeable in that the first has a single non-primary output and the second has a single primary output.
<p:store .../> <p:viewport match="/doc/section"> <p:store href="..."/> </p:viewport>
My contention is that by requiring when you have output you have one port designated as primary, a pipeline will be able to be manipulated with less additional surgery. In my case recently, it was the fact that I had following step structure:
<p:store .../> <p:xslt> <p:input port="source"> <p:pipe step="somewhere" port="result"/> </p:xslt>
I then wrapped it with a viewport:
<p:viewport> <p:store .../> </p:viewport> <p:xslt> <p:input port="source"> <p:pipe step="somewhere" port="result"/> </p:xslt>
and got errors as the primary output port isn't connected. I had to do this to fix it:
<p:viewport> <p:store .../> </p:viewport> <p:sink/> <p:xslt> <p:input port="source"> <p:pipe step="somewhere" port="result"/> </p:xslt>
With my proposal, I would have originally been required to write:
<p:store../> <p:sink/> <p:xslt> <p:input port="source"> <p:pipe step="somewhere" port="result"/> </p:xslt>
The following is a list of proposed steps which require explanation, justification and use cases.
p:sax-filter
p:sort
These steps are in the “proposed OS extension namespace”, http://exproc.org/proposed/steps/os, identified by the prefix “pos”.
This function returns the “current working directory” of the processor. This function takes no arguments and does not depend on the context. This function should only be implemented by processors for which the concept of a “current working directory” is coherent.
<p:declare-step type="pos:cwd"> <p:output port="result" sequence="true"/> </p:declare-step>
The pos:cwd step returns a single c:result containing the current working directory.
On systems which have no concept of a working directory, this step returns the empty
sequence. (This step duplicates the cwd
attribute on the c:result from
pos:info; it's just for convenience.)
There are no standard XProc steps that change the working directory, so this function is likely to return the same value every time it is called. However, there is nothing which prevents an extension step from being defined which changes the current working directory, so it is not necessarily the case that the same value will always be returned.
Returns information about the environment. On systems which nave no concept of an environment and environment variables, this step returns an empty c:result.
<p:declare-step type="pos:env"> <p:output port="result"/> </p:declare-step>
The pos:env step returns information about the operating system environment. It
returns a c:result containing zero or more c:env elements. Each c:env has
name
and value
attributes containing the "name"
and "value" of an environment variable.
Returns information about the operating system.
<p:declare-step type="pos:info"> <p:output port="result"/> </p:declare-step>
The pos:info step returns information about the operating system on which the processor is running. It returns a c:result element with attributes describing properties of the system. The exact set of properties returned is implementation-dependent. It should include the following properties:
The file separator; usually “/” on Unix, “\” on Windows.
The path separator; usually “:” on Unix, “;” on Windows.
The operating system architecture, for example “i386”.
The name of the operating system, for example “Mac OS X”.
The version of the operating system, for example “10.5.6”.
The current working directory.
The login name of the effective user, for example “ndw”.
The home diretory of the effective user, for example “/home/ndw”.
The following list is informed by Calabash and eXProc Proposed Steps
These steps are in the “proposed file utilities extension namespace”, http://exproc.org/proposed/steps/file, identified by the prefix “pxf”.
Relates to F.2 Resource Management
Copies a file.
<p:declare-step type="pxf:copy"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="target" required="true"/> <!-- boolean --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:copy copies the file named in href to the new name specified in target. If the target is a directory, the step attempts to move the file into that directory, preserving its base name. If the copy is successful, the step returns a c:result element containing the absolute URI of the target. If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href does not exist or cannot be copied to the specified target.
This function changes the “current working directory” of the processor. This function takes one argument and does not depend on the context. This function should only be implemented by processors for which the concept of a “current working directory” is coherent.
There are currently no standard XProc steps that change the working directory. However, there is nothing which prevents an extension step from being defined which changes the current working directory.
Deletes a file.
<p:declare-step type="pxf:delete"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="recursive" select="'false'"/> <!-- boolean --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:delete step attempts to delete the file or directory named in href. If the file or directory is successfully deleted, the step returns a c:result element containing the absolute URI of the deleted file. If href specifies a directory, it can only be deleted if the recursive option is true or if the directory is empty. If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href does not exist or cannot be deleted.
Occurs if the step attempts to delete a directory that is not empty and the recursive option is not true.
Returns the first few lines of text file.
<p:declare-step type="pxf:head"> <p:output port="result"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="count" required="true"/> <!-- int --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
Returns the first count lines of the file named in href. If count is negative, the step returns all except those first lines. The step returns a c:result element containing one c:line for each line. Lines are identified as described in ???, 2.11 End-of-Line Handling. If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href does not exist or cannot be read..
Occurs if the file named in href does not appear to be a text file. The exact conditions that constitute “does not appear to be” are implementation-defined.
Returns information about a file or directory.
<p:declare-step type="pxf:info"> <p:output port="result" sequence="true"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The info step returns information about the file or directory named in href. The step returns a c:directory for directories, a c:file for ordinary files, or a c:other for other kinds of filesystem objects. Implementations may also return more specific types, for example c:device, so anything other than c:directory or c:file must be interpreted as “other”. If the document doesn't exist, an empty sequence is returned.
The document element of the result, if there is one, will have the following attributes:
Attribute | Type | Description |
---|---|---|
readable | xs:boolean | “true” if the object is readable. |
writable | xs:boolean | “true” if the object is writable. |
hidden | xs:boolean | “true” if the object is hidden. |
last-modified | xs:dateTime | The last modification time of the object expressed in UTC. |
size | xs:integer | The size of the object in bytes. |
If the value of a particular attribute is unknown or inapplicable for the particular
kind of object, or in the case of boolean attributes, if it's false, then the
attribute is not present. Additional implementation-defined attributes may be present,
but they must be in a namespace. If the href
attribute specified is not a
file:
URI, then the result is implementation-defined.
If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href does not exist or cannot be read..
Creates a directory.
<p:declare-step type="pxf:mkdir"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:mkdir step creates a directory with the name in href. If the name includes more than one directory component, all of the intermediate components are created. The path separator is implementation-defined. The step returns a c:result element containing the absolute URI of the directory created.
If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href
does not exist or cannot be
created.
Moves (renames) a file or directory.
<p:declare-step type="pxf:move"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="target" required="true"/> <!-- boolean --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:move step attempts to move (rename) the file specified in the href
option to the new name specified in the target
option. If the target is a
directory, the step attempts to move the file into that directory, preserving its base
name. If the move is successful, the step returns a c:result element containing the
absolute URI of the new name of the file. The original file is effectively removed.
If the fail-on-error
option is "true", then the step will
fail if a file with the name specified in the target
option already exists,
or if the file specified in href
does not exist or cannot be moved. If the
fail-on-error
option is "false", the step returns a
c:error element which may contain additional, implementation-defined information about
the nature of the error.
If the href
option specifies a directory, device, other special kind of
object, the results are implementation-defined.
Occurs if the file named in href
does not exist or if the file named
in target
cannot be created..
Returns the last few lines of a text file.
<p:declare-step type="pxf:tail"> <p:output port="result"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="count" required="true"/> <!-- int --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
Returns the last count lines of the file named in href. If count is negative, the step returns all except those last lines. The step returns a c:result element containing one c:line for each line. Lines are identified as described in ???, 2.11 End-of-Line Handling.
If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href does not exist or cannot be read.
Occurs if the file named in href does not appear to be a text file. The exact conditions that constitute “does not appear to be” are implementation-defined.
Creates a temporary file.
<p:declare-step type="pxf:tempfile"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="prefix"/> <!-- string --> <p:option name="suffix"/> <!-- string --> <p:option name="delete-on-exit"/> <!-- boolean --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:tempfile step creates a temporary file. The temporary file is guaranteed not
to already exist when pxf:tempfile is called. The file is created in the directory
specified by the href
option. If prefix
is specified, the file's
name will begin with that prefix. If suffix
is specified, the file's name
will end with that suffix.
The step returns a c:result element containing the absolute URI of the temporary file. If the delete-on-exit option is true, then the temporary file will automatically be deleted when the processor terminates.
If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href
does not exist or cannot be
read..
Occurs if it is not possible to create a file in the href
directory.
Update the modification time of a file.
<p:declare-step type="pxf:touch"> <p:output port="result" primary="false"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="timestamp"/> <!-- xs:dateTime --> <p:option name="fail-on-error" select="'true'"/> <!-- boolean --> </p:declare-step>
The pxf:touch step “touches” the file named in href
. The file will be
created if it does not exist. If timestamp
is specified, the modification
time of the file will be updated to the specified time. If unspecified, the current
date and time will be used. The step returns a c:result element containing the
absolute URI of the touched file.
If an error occurs, the step fails if fail-on-error is true; otherwise, the step returns a c:error element which may contain additional, implementation-defined information about the nature of the error.
Occurs if the file named in href
does not exist or cannot be
changed..
These steps are in the “proposed extension namespace”, http://exproc.org/proposed/steps, identified by the prefix “pxp”.
unzip
A step for extracting information out of ZIP archives.
From http://exproc.org/proposed/steps/other.html
<p:declare-step type="pxp:unzip"> <p:output port="result"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="file"/> <!-- string --> <p:option name="content-type"/> <!-- string --> </p:declare-step>
The value of the href
option must be an IRI. It is a dynamic error if
the document so identified does not exist or cannot be read. The value of the
file
option, if specified, must be the fully qualified path-name of a
document in the archive. It is dynamic error if the value specified does not identify
a file in the archive. The output from the pxp:unzip
step must conform to
the ziptoc.rnc schema. If the file
option is specified, the selected file
in the archive is extracted and returned:
If the content-type is not specified, or if an XML content type is specified, the file is parsed as XML and returned. It is a dynamic error if the file is not well-formed XML.
If the content-type specified is not an XML content type, the file is base64 encoded and returned in a single c:data element.
If the file option is not specified, a table of contents for the archive is returned. For example, the contents of the XML Calabash 0.8.5 distribution archive might be reported like this:
<c:zipfile xmlns:c="http://www.w3.org/ns/xproc-step" href="http://xmlcalabash.com/download/calabash-0.8.5.zip"> <c:directory name="calabash-0.8.5/" date="2008-11-04T19:29:20.000-05:00"/> <c:directory name="calabash-0.8.5/docs/" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="11942" size="36677" name="calabash-0.8.5/docs/CDDL+GPL.txt" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="928" size="2110" name="calabash-0.8.5/docs/ChangeLog" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="6817" size="17987" name="calabash-0.8.5/docs/GPL.txt" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="494" size="830" name="calabash-0.8.5/docs/NOTICES" date="2008-11-04T19:29:20.000-05:00"/> <c:directory name="calabash-0.8.5/lib/" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="389650" size="407421" name="calabash-0.8.5/lib/calabash.jar" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="1237" size="2493" name="calabash-0.8.5/README" date="2008-11-04T19:29:20.000-05:00"/> <c:directory name="calabash-0.8.5/xpl/" date="2008-11-04T19:29:20.000-05:00"/> <c:file compressed-size="175" size="255" name="calabash-0.8.5/xpl/pipe.xpl" date="2008-11-04T19:29:20.000-05:00"/> </c:zipfile>
- I think for non-XML data, the step should behave as p:data or p:http-request. Right now, the pxp:unzip spec says that: "If the content-type specified is not an XML content type, the file is base64 encoded and returned in a single c:data element." This obviously does not match the behavior of p:data wrt text media types. The pxp:unzip step also does not insert the "content-type" and "encoding" attributes on the c:data wrapper.
- What happens if the file specified through the "file" option is not found in the archive (I assume a dynamic error)?
zip
A step for creating ZIP archives.
From http://exproc.org/proposed/steps/other.html
<p:declare-step type="pxp:zip"> <p:input port="source" sequence="true" primary="true"/> <p:input port="manifest"/> <p:output port="result"/> <p:option name="href" required="true"/> <!-- anyURI --> <p:option name="compression-method"/> <!-- "stored" | "deflated" --> <p:option name="compression-level"/> <!-- "smallest" | "fastest" | "default" | "huffman" | "none" --> <p:option name="command" select="'update'"/> <!-- "update" | "freshen" | "create" | "delete" --> </p:declare-step>
The ZIP archive is identified by the href
. The manifest (described
below) provides the list of files to be processed in the archive. The command
indicates the nature of the processing: “update”, “freshen”, “create”, or “delete”. If
files are added to the archive, compression-method indicates how they should be added:
“stored” or “deflated”. For deflated files, the compression-level identifies the kind
of compression: “smallest”, “fastest”, “default”, “huffman”, or “none”. The entries
identified by the manifest are processed. The manifest must conform to the following
schema:
default namespace c="http://www.w3.org/ns/xproc-step" start = zip-manifest zip-manifest = element c:zip-manifest { entry* } entry = element c:entry { attribute name { text } & attribute href { text } & attribute comment { text }? & attribute method { "deflated" | "stored" } & attribute level { "smallest" | "fastest" | "huffman" | "default" | "none" } empty }
For example:
<zip-manifest xmlns="http://www.w3.org/ns/xproc-step"> <entry name="file1.xml" href="http://example.org/file1.xml" comment="An example file"/> <entry name="path/to/file2.xml" href="http://example.org/file2.xml" method="stored"/> </zip-manifest>
If the command is “delete”, then file1.xml and path/to/file2.xml will be deleted from the archive. Otherwise, the file that appears on the source port that has the base URI http://example.org/file1.xml will be stored in the archive as file1.xml (using the default method and level), the file that appears on the source port that has the base URI http://example.org/file2.xml will be stored in the archive as path/to/file2.xml without being compressed.
A c:zipfile description of the archive content is produced on the result port.
- What about source files that are not included in the pxp:zip manifest? Is that an error or do they end up in the ZIP archive under their original base URI?
- Serialization. At the moment, pxp:zip does not allow to specify how XML documents are serialized in the ZIP archive. I ended up with adding serialization options to pxp:zip which are applied to each XML file and are therefore archive-global. It might be useful, though, to be able to specify different serialization options per file - but that would probably require putting the serialization options into the pxp:zip manifest somehow.
- Not sure about the compression level names "smallest" | "fastest" | "default" | "huffman" | "none". They are a direct lift from the Java java.util.zip.Deflater API. Plus, the "huffman" constant is not a compression level, but a compression strategy. I think it should not be in the list.
- The pxp:zip step returns a c:zipfile representation of the ZIP archive on the "result" port. While I understand that this might be useful, it is not consistent with existing standard steps that write output to an external location (p:store, p:xsl-formatter) and that return a URI reference to the external data.
it would be nice if it were possible to compress non-XML data as well (in a similar way that p:http-request allows sending non-XML request bodies). Otherwise things such as creating an EPUB with images would still be impossible with standard XProc.
Run a static, known step, whose type is computed dynamically.
An example would help.
Compile a pipeline and run it.
<p:declare-step type="cx:eval"> <p:input port="pipeline"/> <p:input port="source" sequence="true"/> <p:input port="options"/> <p:output port="result"/> <p:option name="step"/> <!-- QName --> <p:option name="detailed"/> <!-- boolean --> </p:declare-step>
In the simplest case, where the specified pipeline has a single input and a single output, the document(s) on the source port are passed to the pipeline, processed, and the results are passed back on the result port.
If the pipeline specified has multiple inputs or outputs, then the inputs and outputs have to be “multiplexed” on the single port. If this is the case, you must specify that the detailed option is “true”, and encode the input using cx:document. Each input must be wrapped in cx:document with a port attribute that identifies the port to which that document is to be sent. Each output will be wrapped in a cx:document element identifying the port from which it came.
If the pipeline has options, they are passed to the options port. Each options document must have cx:options as its document element and consist entirely of cx:option elements with name and value attributes that specify options and their values.
If the pipeline is a p:library, then the step to evaluate may be specified using the step option. If the pipeline is a library and no step option is specified, the first step in the library will be selected.
Relates to Use Case: 5.28 Document Schema Definition Languages (DSDL) - Part 10: Validation Management
These steps are in the “proposed extension namespace”, http://exproc.org/proposed/steps, identified by the prefix “pxp”.
A step for performing NVDL (Namespace-based Validation Dispatching Language) validation over mixed-namespace documents.
<p:declare-step type="pxp:nvdl"> <p:input port="source" primary="true"/> <p:input port="nvdl"/> <p:input port="schemas" sequence="true"/> <p:output port="result"/> <p:option name="assert-valid" select="'true'"/> <!-- boolean --> </p:declare-step>
The source document is validated using the namespace dispatching rules contained in the nvdl document. The dispatching rules may contain URI references that point to the actual schemas to be used. As long as these schemas are accessible, it is not necessary to pass anything on the schemas port. However, if one or more schemas are provided on the schemas port, then these schemas should be used in validation. This requirement is expressed only as a “should” and not a “must” because XProc version 1.0 does not mandate that implementations support caching of documents so that requests for a URI by one step can automatically access the result of some other step if that result had a base URI identical to the requested document.
However, it's not clear that the schemas port has any value if the implementation does not support this behavior. The value of the assert-valid option must be a boolean. It is a dynamic error if the assert-valid option is true and the input document is not valid. The output from this step is a copy of the input, possibly augmented by application by schema processing. The output of this step may include PSVI annotations.
A step to handle SMTP and sending e-mail messages.
<p:declare-step type="cx:send-mail"> <p:input port="source" sequence="true"/> <p:output port="result"/> </p:declare-step>
The first document on the source port is expected to conform to An XML format
for mail and other messages. Any additional documents are treated as
attachments. The em:content
may contain either text or HTML. To send some
other type as the first message body, you must leave the em:content element out of the
first document and supply the body as a second document.
The xyz namespace is speculative.
Based upon review of existing Use Case, a new p:sign
step is required to
satisfy 5.10 XInclude and Sign
Relates to F.4.12 Simplify Use of File Sets
Consider a xyz:documents element which roughly emulates apache ant filesets
Repeat a [step | group] until some XPath expression is satisfied, feeding its output back as its input after the first go-around. Special built-in support for iterate to fixed-point?
A way to merge the context defined by elements p:xpath-context
,
p:viewport-source
, p:iteration-source
?
The xyz namespace is speculative.
These are highly speculative steps hypothesized by an editor. -- MM
Relates to F.3.4 Debugging
The dbxml namespace is speculative.
We note steps and functions which provide access to a variety of information that is useful in debugging:
Processor XPath Context
Step XPath Context
p:log
p:documentation
p:pipeinfo
p:step-available
p:value-available
p:iteration-position
p:iteration-size
p:base-uri
p:version-available
p:xpath-version-available
Set a breakpoint, optionally based upon a condition, that will cause pipeline operation to pause at the breakpoint, possibly requiring user intervention to continue and/or issuing a message.
Set debug scope and declare its mode. Provides advice to a processor to facilitate targeted debugging. Allows programmers to leave an audit trail for quality assurance purposes. The mode could be an NMTOKEN list representing, for example, levels of verbosity. Implies the existence of a processor debug state stack.
Like xsl:message. Issue a debugging message, typically including dynamic representation of one or more execution variables or functions to assist in the debugging process. A message may be a simple text message or a complex XML document. Presumes ability to resolve references to named variables in the pipeline and processor environments. Presumably these messages would be issued conditionally, which mechanism is extant. The output port(s) would likely depend on the application.
Members of the Working Group contributed to this specification as noted throughout.
Erik Bruchez provided use cases.
Alex Milowski produced the original XProc Requirements and Use Cases Working Draft and provided many use cases.
Henry Thompson provided use cases.
Norm Walsh contributed details of steps that are implemented in Calabash and provided use cases.
Mohamed Zergaoui contributed through email and working group discussion.