- From: Norman Walsh <ndw@nwalsh.com>
- Date: Thu, 26 Sep 2013 12:48:37 +0100
- To: public-xml-processing-model-wg@w3.org
- Message-ID: <m2vc1niz1m.fsf@nwalsh.com>
* XProc V.Next requirements
** MUST Simplify parameters
Experience with parameters in XProc 1.0 reveals that they are too
complicated. They often cause user confusion and introduce syntactic
complexity not justified by their function. XProc v2.0 must
dramatically simplify parameters, perhaps simply removing parameter
ports altogether without replacing them with a new mechanism of
equivalent power (and complexity).
*** Consider the possibility of dropping parameter ports altogether
and not replacing them with a new mechanism is in the frame
** MUST Integrate non-XML documents into the pipeline flow
Experience has shown that real-world pipelines often involve non-XML
documents. Several workarounds have been invented for special cases.
The limitation that V1.0 can only pass XML between steps makes some
pipelines difficult, if not impossible, to write.
Providing the ability to allow non-XML documents to flow between
steps opens up the possibility of writing simple pipelines to work
with images, JSON, Turtle, EPUB, etc.
*** Consider what required steps do with non-XML documents
** MUST Align with XQuery/XSLT 3.0 specifications
Alignment with XQuery/XSLT 3.0 will keep features of XProc
consistent with modern XML technologies: error handling,
serialization options, XDM features, etc. In addition, support for
XPath 1.0 no longer seems relevant; it adds complexity to the
specification and is unlikely to be implemented today. XPath 1.0
support will be removed from XProc.
*** XDM and Serialization
*** Remove all support for XPath 1.0
*** Is our p:error step consistent with other languages?
** MUST Add explicit flow handling
There are many pipelines for which the flow analysis does not
provide a convenient or predictable ordering of steps. Because some
steps have side effects not manifest in the pipeline, it may be
necessary to ensure a particular order. This facility is not
supported by XProc 1.0, but is available in implementation-defined
extensions. XProc 2.0 will standardize this facility.
*** A "depends-on" attribute?
** MUST Allow arbitrary XDM values in variables, options, and parameters
XProc 1.0 restricts the values of variables, options, and parameters
to be only strings. This has proven to be an inconvenient limitation.
XProc 2.0 will allow variables, options, and parameters to have any
XDM value insofar as possible. XProc 2.0 will also allow the required
types of variables, options, and parameters to be specified.
** MUST Allow AVTs
The syntactic sugar that allows step options to be expressed
concisely as attribute values on a step is foiled whenever the value
of the option must be computed by the pipeline. Allowing those
options to contain XSLT-style attribute value templates (AVTs) would
simplify many pipelines. Additionally, allowing AVTs in other places,
such as the href attribute on p:document, will be considered.
XSLT 3.0 introduces a feature which allows expressions in curly
braces to be evaluated in element content. This feature is similar
to the facility provided by the p:template step. Extending XProc to
support curly braces in a manner consistent with XSLT 3.0 will be
considered.
*** Where?
**** In the syntactic shortcut form of option values
**** In a 'value' attribute on p:with-option, etc.?
**** In the 'href' attribute of p:document?
**** Support the XSLT 3.0 curly braces in element content?
** MUST Document backwards-incompatibilities in V.next pipelines
Backwards incompatiblity is painful for users and will be avoided
wherever possible. However, XProc 2.0 will introduce language features
that are not backwards compatible with 1.0. The specification must
document these incompatibilities.
*** We may decide to make non-backwards compatible changes
*** To what extent will 1.0 pipelines be 2.0 pipelines?
*** What will cause a 2.0 processor to run a 1.0 pipeline with
different semantics
*** How will V.next play with the 1.0 "forwards compatibility" rules?
** SHOULD Make editorial improvements
Implementation experience has demonstrated that there are areas of
the specification that didn't get the balance right between precision
for implementors and clarity for users, for example "non-step
wrappers". The XProc 2.0 specification should attempt to resolve
these problems without introducing inordinate complexity.
The 1.0 specification also defines the p:pipeline element as a
syntactic shortcut for a particular form of p:declare-step. While
convenient in some circumstances, it has proven to be a source of
some confusion especially among new users. XProc 2.0 may remove
the p:pipeline element.
*** Remove the concept of "non-step wrapper"
*** Remove the p:pipeline element
** SHOULD Provide a way to associate arbitrary metadata with documents
Adding metadata to documents is a natural thing for pipelines to do,
either for subsequent use by the pipeline or for eventual output.
For example, the serialization options provided in an XSLT
stylesheet could be carried forward to the eventual serialization of
the result document by the pipeline. In XProc 1.0, there's no way to
maintain that association. XProc 2.0 should support the ability to
associate processor and user-defined metadata with documents.
*** Carrying serialization options forward
*** MIME types associated with documents
** SHOULD Support steps with a dynamic number of inputs and outputs
While most steps have a predetermined and static number of inputs
and outputs, this is not universally the case. In XProc 1.0, a
putative p:eval step which could run a dynamically constructed
pipeline, for example, suffers from the limitation that the
signature of the p:eval step usually differs from the signature of
the evaluated pipeline.
XProc 2.0 should provide a facility for supporting steps with a
variable number of inputs and outputs.
*** Split, Join, NVDL, Eval, etc.
** SHOULD Provide improves status information during pipeline execution
XProc 1.0 provides scant support for reporting the status of a
pipeline and providing aid to users attempt to debug pipelines.
Implementation-defined extensions have demonstrated that some
additional facilities, such as a p:message step, would be an aid
to users.
XProc 2.0 will add some mechanism for reporting status messages and
will consider adding additional steps and/or language features to
aid in analysing the behavior of a running pipeline.
*** Support users attempting to debug pipeline errors
*** p:message?
*** p:message attribute on any step, using AVTs.
** SHOULD Provide a mechanism for importing user-defined functions
Experience with user-defined functions in XQuery and XSLT reveals
that they can be a powerful addition to the language. Providing some
feature that allowed users to extend the vocabulary of functions
available in, for example, the test expressions on p:when elements
would greatly simplify some pipelines.
Such a mechanism might take the form of the ability to load
extension functions defined in, for example, XQuery, or it might
include adding the ability to define functions in XProc.
*** Defined in XQuery, XSLT, ... Python, Ruby, Scala, JavaScript, Perl?
*** p:function step that defines functions?
** SHOULD Enhance try/catch to catch specific error codes
Support for catching errors in XProc 1.0 is limited to a simple
p:try/p:catch pair, which catches and handles all errors uniformly.
To align XProc with modern languages, the try/catch mechanism will
be extended to support the ability to catch specific errors and
possibly with the addition of a "finally" construct.
*** Multiple catch statements for specific errors
*** p:finally?
** SHOULD Support a variety of syntactic simplifications
XProc 1.0 offers relatively few default behaviors, requiring instead
that pipelines specify every construct fully. User experience has
demonstrated that this leads to very verbose pipelines and has been
a constant source of complaint. XProc 2.0 will introduce a variety
of syntactic simplifications as an aid to readability and usability,
including but not limited to:
*** <p:pipe step="name"/> should bind to the primary output port
of the step named 'name'. It is an error if there is no such
primary output port.
*** <p:pipe port="secondary"/> should bind to the 'secondary' port
of the step on which the default readable port occurs. It is an
error if there is no such step.
*** <p:input port="portname" href="..."/> should be a shortcut for
a document binding to the URI specified in href.
*** <p:input port="portname"/> should be a shortcut for an empty binding.
*** Allow p:inline to be optional
*** Allow curly brace expansion in p:inline (with an attribute to control
whether or not that behavior is enabled)
*** Provide a select attribute to p:for-each/p:viewport
*** Change all steps with a single non-primary output to have
a single primary output
**** What are the semantics of select on p:for-each?
*** Consider harmonizing p:viewport-source and p:iteration-source
*** Add an AVT 'value' attribute to options, parameters, variables
** SHOULD Write a primer
A new user introduction to XProc would aid adoption.
** SHOULD Consider using XDM everywhere
In addition to supporting XDM values in variables, options, and
parameters, XProc 2.0 might allow XDM values in more places,
such as allowing p:for-each to iterate over a sequence of strings
or integers.
**** For example, selecting a sequence of strings with p:for-each
** SHOULD Consider dividing the spec into two parts
XProc 1.0 is a specification that consists of both the language
definition and the inventory of required and optional steps. Release
management might be simplified by separating the language core from
the vocabulary of steps and providing some sort of versioning
strategy that allowed the vocabulary of steps to be revised more
frequently. XProc 2.0 may be defined in more than one Rec-track
specification document.
**** Implies new versioning strategy?
** SHOULD Consider additional steps and enhancements
The vocabulary of steps available in XProc is extensible. Users and
implementors have developed additional steps. For example, to
support pipelines that produce EPUB documents or manipulate files on
disk. It is worth considering which, if any, new steps should be
elevated to the XProc namespace. The candidates include, but are not
limited to:
*** p:zip
*** p:unzip
*** p:template (and XSLT 3.0 curly braces in element content)
*** p:in-scope-names
*** p:eval
*** Semantic web steps (p:sparql, p:rdfa, ...)?
*** Operating system steps (p:env, ...)?
*** File system steps (p:mkdir, p:copy, ...)?
Be seeing you,
norm
--
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com
Received on Thursday, 26 September 2013 11:49:17 UTC