subpipelines, Vnext and extension elements redux from Henry S. Thompson on 2008-03-18 (public-xml-processing-model-wg@w3.org from March 2008)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Tue, 18 Mar 2008 17:19:00 +0000
To: public-xml-processing-model-wg@w3.org
Message-ID: <f5b4pb4rlvf.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

What makes up subpipelines?  To what extent do we allow for language
evolution, both official (Vnext) and unofficial (extensions)?

Or, to put it another way, the current spec. says (3.8) "It is a
_static error_ if any element in the XProc namespace or any step has
element children other than those specified for it by this
specification".  What does that mean wrt e.g. the children of p:group?
There is a brief mention in section 4.7 of "other compound steps".
How could such a thing ever be allowed, given the preceding quote?

Let's start from the top and try to take this one step at a time (this
is an attempt at an analysis, not (yet) a proposal for revising the
spec.):

 subpipeline -> (atomic-step|compound-step|non-step)*

 non-step -> p:variable|p:documentation|p:pipeinfo

So far so non-controversial, I hope.

What about atomic steps?  I think we have something like this:

 atomic-step -> (xproc-step|impl-step|user-step)

 xproc-step -> (supported-xproc-step|unsupported-xproc-step)

 supported-xproc-step -> p:stepname, where p:stepname is declared in the
                                     canonical library for a version
                                     of XProc supported by this
                                     implementation and actually
                                     implemented by it

 unsupported-xproc-step -> p:stepname, where p:stepname is declared in some
                                       canonical library, but not
                                       implemented by the
                                       implementation, either
                                       legitimately, because it's
                                       optional, or from a later
                                       version, or illegitimately

 impl-step -> (supported-impl-step|unsupported-impl-step)

 supported-impl-step -> pfx:stepname, where pfx:stepname has an
                                      in-scope declaration and is
                                      implemented directly by the
                                      implementation

 unsupported-impl-step -> pfx:stepname, where pfx:stepname has an
                                        in-scope declaration with no
                                        subpipeline but is not
                                        implemented directly by the
                                        implementation

 user-step -> pfx:stepname, where pfx:stepname has an in-scope
                            declaration with a subpipeline

So I think that's all fine -- it's all driven by namespaces and
declarations, where the decls come from and whether they have
contents.

So, the hard part:

 compound-step -> (xproc-compound|impl-compound)

 xproc-compound -> (supported-xproc-compound|unsupported-xproc-compound)

 supported-xproc-compound -> (p:for-each|p:viewport|p:choose|p:group|
                              p:try|p:compound),
                              where p:compound is from a post-1.0
                              version of XProc which is supported by
                              the implementation

 unsupported-xproc-compound -> [can't do this yet! -- no way to
                                distinguish between a typo,
                                e.g. p:foreach, and a vNext compound
                                which this implementation doesn't know
                                about]

 impl-compound -> (supported-impl-compound|unsupported-impl-compound)

 supported-impl-compound -> pfx:compound, where pfx:compound is
                            supported by the implementation

 unsupported-impl-compound -> [can't do this yet! -- no way to
                               distinguish between a typo and some
                               private compound step type this
                               implementation doesn't happen to know
                               about]

So, the problem is that either we have a tight syntax, in which case
as things stand there's _no way_ to have extension compound steps in a
graceful way, or we have a loose syntax (get rid of the static error
quoted above from section 3.8) and let all kinds of garbage in.

What do I mean by "in a graceful way"?  Consider what you can do for
atomic steps with p:choose and p:step-available:

 <p:declare-step type="my:ackerman">
  <p:option name="m"/>
  <p:option name="n"/>
 </p:declare-step>

 . . .

 <p:choose>
  <p:when test="p:step-available('my:ackerman')">
   <my:ackerman m="3" n="2"/>
  </p:when>
  <p:otherwise>
   <p:xslt>
    <p:input port="stylesheet">
     <p:document href="slow-ackerman.xsl"/>
    </p:input>
    <p:with-param name="m" value="3"/>
    <p:with-param name="n" value="2"/>
   </p:xslt>
  </p:otherwise>
 </p:choose>

All is fine.  my:ackerman is declared, so whether or not an
implementation is available, the pipeline is syntactically OK per the
current spec.

But consider the parallel case for a private compound step:
 <p:choose>
  <p:when test="p:step-available('my:map-reduce')">
   <my:map-reduce>
    [a subpipeline]
   </my:map-reduce>    
  </p:when>
  <p:otherwise>
   [some tedious workaround]
  </p:otherwise>
 </p:choose>

This just loses as I understand the current spec.  my:map-reduce is an
unknown element, hence not-allowed-as-child-of-p:when.

One possible way forward would be to introduce a minimal way to
declare compound steps, e.g. <p:declare-compound-step type="[QName]"/>

We would included declarations for the five v1 compound steps in the
canonical library for v1, implementations could declare their private
compound steps, and that would allow the analysis above for compound
steps to be brought into line with that for atomic steps.  But Richard
Tobin points out that this would introduce other problems.  Consider

 <p:when test="p:step-available('my:compound')">
  <p:identity/>

  <my:compound>
   ....
  </my:compound>
 </p:when>

How does an implementation which _doesn't_ implement my:compound know
whether the "primary outputs *must* be consumed" rule is violated or
not?  Clearly it can't, since my:compound may have an arbitrarily
complex syntax before you get to, say, a subpipeline which might have
a step which bound to the relevant output.

So, we appear to have at least four choices:

 1) Add a p:declare-compound-step, and try to specify what constraints
   _don't_ hold of subpipelines with unimplemented compound steps in
   them;

 2) Loosen the syntax so that unknown elements are assumed to be
    unimplemented compound steps and are ignored;

 3) Go back to the idea of extension namespaces, and treat unknown
    elements _in an extension namespace_ as unimplemented compound
    steps;

 4) Accept that there is no backward-compatible way to introduce
    new/extension compound steps, and therefore that they will cause
    static errors in implementations which don't know about them.

I guess after all this I prefer (4), on the grounds that (1) is just
too messy, (2) gives up too much, (3) doesn't allow for new compound
steps _in the pipeline language_ in a backward-compatible way, and
anyway, the chances of a workaround being available which would enable
one to write backwards-compatible pipelines using new/extension
compound steps is so small that there's no point in buggering with the
language to make that possible.

Phew!

_If_ we accept this analysis and its conclusion, I think I know what
2.1 and 4.7 should look like . . .

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFH3/mEkjnJixAXWBoRAutuAKCBmqNLMseiaSEZzgGeOMB4WdEFKACfdnOj
YcfNjywHF095fTY5/Cjx/P8=
=TeUo
-----END PGP SIGNATURE-----
Received on Tuesday, 18 March 2008 17:19:37 UTC