Requirements notes from the morning of 26 Sep

* XProc V.Next requirements
** MUST Simplify parameters

  Experience with parameters in XProc 1.0 reveals that they are too
  complicated. They often cause user confusion and introduce syntactic
  complexity not justified by their function. XProc v2.0 must
  dramatically simplify parameters, perhaps simply removing parameter
  ports altogether without replacing them with a new mechanism of
  equivalent power (and complexity).

*** Consider the possibility of dropping parameter ports altogether
       and not replacing them with a new mechanism is in the frame

** MUST Integrate non-XML documents into the pipeline flow

  Experience has shown that real-world pipelines often involve non-XML
  documents. Several workarounds have been invented for special cases.
  The limitation that V1.0 can only pass XML between steps makes some
  pipelines difficult, if not impossible, to write.

  Providing the ability to allow non-XML documents to flow between
  steps opens up the possibility of writing simple pipelines to work
  with images, JSON, Turtle, EPUB, etc.

*** Consider what required steps do with non-XML documents

** MUST Align with XQuery/XSLT 3.0 specifications

  Alignment with XQuery/XSLT 3.0 will keep features of XProc
  consistent with modern XML technologies: error handling,
  serialization options, XDM features, etc. In addition, support for
  XPath 1.0 no longer seems relevant; it adds complexity to the
  specification and is unlikely to be implemented today. XPath 1.0
  support will be removed from XProc.

*** XDM and Serialization
*** Remove all support for XPath 1.0
*** Is our p:error step consistent with other languages?
** MUST Add explicit flow handling

  There are many pipelines for which the flow analysis does not
  provide a convenient or predictable ordering of steps. Because some
  steps have side effects not manifest in the pipeline, it may be
  necessary to ensure a particular order. This facility is not
  supported by XProc 1.0, but is available in implementation-defined
  extensions. XProc 2.0 will standardize this facility.

*** A "depends-on" attribute?
** MUST Allow arbitrary XDM values in variables, options, and parameters

  XProc 1.0 restricts the values of variables, options, and parameters
  to be only strings. This has proven to be an inconvenient limitation.
  XProc 2.0 will allow variables, options, and parameters to have any
  XDM value insofar as possible. XProc 2.0 will also allow the required
  types of variables, options, and parameters to be specified.

** MUST Allow AVTs

  The syntactic sugar that allows step options to be expressed
  concisely as attribute values on a step is foiled whenever the value
  of the option must be computed by the pipeline. Allowing those
  options to contain XSLT-style attribute value templates (AVTs) would
  simplify many pipelines. Additionally, allowing AVTs in other places,
  such as the href attribute on p:document, will be considered.

  XSLT 3.0 introduces a feature which allows expressions in curly
  braces to be evaluated in element content. This feature is similar
  to the facility provided by the p:template step. Extending XProc to
  support curly braces in a manner consistent with XSLT 3.0 will be
  considered.

*** Where?
**** In the syntactic shortcut form of option values
**** In a 'value' attribute on p:with-option, etc.?
**** In the 'href' attribute of p:document?
**** Support the XSLT 3.0 curly braces in element content?
** MUST Document backwards-incompatibilities in V.next pipelines

  Backwards incompatiblity is painful for users and will be avoided
  wherever possible. However, XProc 2.0 will introduce language features
  that are not backwards compatible with 1.0. The specification must
  document these incompatibilities.

*** We may decide to make non-backwards compatible changes
*** To what extent will 1.0 pipelines be 2.0 pipelines?
*** What will cause a 2.0 processor to run a 1.0 pipeline with
    different semantics
*** How will V.next play with the 1.0 "forwards compatibility" rules?
** SHOULD Make editorial improvements

  Implementation experience has demonstrated that there are areas of
  the specification that didn't get the balance right between precision
  for implementors and clarity for users, for example "non-step
  wrappers". The XProc 2.0 specification should attempt to resolve
  these problems without introducing inordinate complexity.

  The 1.0 specification also defines the p:pipeline element as a
  syntactic shortcut for a particular form of p:declare-step. While
  convenient in some circumstances, it has proven to be a source of
  some confusion especially among new users. XProc 2.0 may remove
  the p:pipeline element.

*** Remove the concept of "non-step wrapper"
*** Remove the p:pipeline element
** SHOULD Provide a way to associate arbitrary metadata with documents

  Adding metadata to documents is a natural thing for pipelines to do,
  either for subsequent use by the pipeline or for eventual output.
  For example, the serialization options provided in an XSLT
  stylesheet could be carried forward to the eventual serialization of
  the result document by the pipeline. In XProc 1.0, there's no way to
  maintain that association. XProc 2.0 should support the ability to
  associate processor and user-defined metadata with documents.

*** Carrying serialization options forward
*** MIME types associated with documents
** SHOULD Support steps with a dynamic number of inputs and outputs

  While most steps have a predetermined and static number of inputs
  and outputs, this is not universally the case. In XProc 1.0, a
  putative p:eval step which could run a dynamically constructed
  pipeline, for example, suffers from the limitation that the
  signature of the p:eval step usually differs from the signature of
  the evaluated pipeline.

  XProc 2.0 should provide a facility for supporting steps with a
  variable number of inputs and outputs.

*** Split, Join, NVDL, Eval, etc.
** SHOULD Provide improves status information during pipeline execution

  XProc 1.0 provides scant support for reporting the status of a
  pipeline and providing aid to users attempt to debug pipelines.
  Implementation-defined extensions have demonstrated that some
  additional facilities, such as a p:message step, would be an aid
  to users.

  XProc 2.0 will add some mechanism for reporting status messages and
  will consider adding additional steps and/or language features to
  aid in analysing the behavior of a running pipeline.

*** Support users attempting to debug pipeline errors
*** p:message?
*** p:message attribute on any step, using AVTs.
** SHOULD Provide a mechanism for importing user-defined functions

  Experience with user-defined functions in XQuery and XSLT reveals
  that they can be a powerful addition to the language. Providing some
  feature that allowed users to extend the vocabulary of functions
  available in, for example, the test expressions on p:when elements
  would greatly simplify some pipelines.

  Such a mechanism might take the form of the ability to load
  extension functions defined in, for example, XQuery, or it might
  include adding the ability to define functions in XProc.

*** Defined in XQuery, XSLT, ... Python, Ruby, Scala, JavaScript, Perl?
*** p:function step that defines functions?
** SHOULD Enhance try/catch to catch specific error codes

  Support for catching errors in XProc 1.0 is limited to a simple
  p:try/p:catch pair, which catches and handles all errors uniformly.
  To align XProc with modern languages, the try/catch mechanism will
  be extended to support the ability to catch specific errors and
  possibly with the addition of a "finally" construct.

*** Multiple catch statements for specific errors
*** p:finally?
** SHOULD Support a variety of syntactic simplifications

  XProc 1.0 offers relatively few default behaviors, requiring instead
  that pipelines specify every construct fully. User experience has
  demonstrated that this leads to very verbose pipelines and has been
  a constant source of complaint. XProc 2.0 will introduce a variety
  of syntactic simplifications as an aid to readability and usability,
  including but not limited to:

*** <p:pipe step="name"/> should bind to the primary output port
    of the step named 'name'. It is an error if there is no such
    primary output port.
*** <p:pipe port="secondary"/> should bind to the 'secondary' port
    of the step on which the default readable port occurs. It is an
    error if there is no such step.
*** <p:input port="portname" href="..."/> should be a shortcut for
    a document binding to the URI specified in href.
*** <p:input port="portname"/> should be a shortcut for an empty binding.
*** Allow p:inline to be optional
*** Allow curly brace expansion in p:inline (with an attribute to control
    whether or not that behavior is enabled)
*** Provide a select attribute to p:for-each/p:viewport
*** Change all steps with a single non-primary output to have
    a single primary output
**** What are the semantics of select on p:for-each?
*** Consider harmonizing p:viewport-source and p:iteration-source
*** Add an AVT 'value' attribute to options, parameters, variables
** SHOULD Write a primer

  A new user introduction to XProc would aid adoption.

** SHOULD Consider using XDM everywhere

  In addition to supporting XDM values in variables, options, and
  parameters, XProc 2.0 might allow XDM values in more places,
  such as allowing p:for-each to iterate over a sequence of strings
  or integers.

**** For example, selecting a sequence of strings with p:for-each
** SHOULD Consider dividing the spec into two parts

  XProc 1.0 is a specification that consists of both the language
  definition and the inventory of required and optional steps. Release
  management might be simplified by separating the language core from
  the vocabulary of steps and providing some sort of versioning
  strategy that allowed the vocabulary of steps to be revised more
  frequently. XProc 2.0 may be defined in more than one Rec-track
  specification document.

**** Implies new versioning strategy?
** SHOULD Consider additional steps and enhancements

  The vocabulary of steps available in XProc is extensible. Users and
  implementors have developed additional steps. For example, to
  support pipelines that produce EPUB documents or manipulate files on
  disk. It is worth considering which, if any, new steps should be
  elevated to the XProc namespace. The candidates include, but are not
  limited to:

*** p:zip
*** p:unzip
*** p:template (and XSLT 3.0 curly braces in element content)
*** p:in-scope-names
*** p:eval
*** Semantic web steps (p:sparql, p:rdfa, ...)?
*** Operating system steps (p:env, ...)?
*** File system steps (p:mkdir, p:copy, ...)?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
Phone: +1 512 761 6676
www.marklogic.com

Received on Thursday, 26 September 2013 11:49:17 UTC