A proposal to restructure our top-level syntax from Henry S. Thompson on 2007-11-30 (public-xml-processing-model-wg@w3.org from November 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Fri, 30 Nov 2007 14:59:28 +0000
To: public-xml-processing-model-wg@w3.org
Message-ID: <f5b7ijzhj8v.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Richard pointed out on yesterday's call that the type/name clash on
p:pipeline arose, as have other issues, because of the schizophrenic
nature of p:pipeline.  Bear with me, but that triggered a thought
which I want to try to work through: we should use p:declare-step to
declare all steps.  That is, not just spec-defined steps, and
implementation-defined steps, but user-defined steps also.
User-defined steps are the steps formerly known as named pipelines.

Here's how this would look:

1) Add optional subpipeline content to p:declare-step -- if it's
   present, it's the definition:

<p:declare-step
  type = QName>
    (p:input |
     p:output |
     p:option)*
     _subpipeline_
</p:declare-step>

2) Simplify p:pipeline-library:

<p:pipeline-library
  ignore-prefixes? = prefix list
  xpath-version? = string>
    (p:import |
     p:declare-step)*
</p:pipeline-library>

3) Simplify p:pipeline itself -- it's now _only_ used for running
   stuff, and is always nameless:

<p:pipeline
  ignore-prefixes? = prefix list
  xpath-version? = string>
    (p:input |
     p:output |
     p:option |
     p:import |
     p:declare-step |
     p:log |
     p:serialization)*,
    subpipeline
</p:pipeline>

(Not sure why p:log is there -- can/should be removed?)

4) Remove the requirement that p:pipeline has to declare its inputs
   and outputs -- applying the compound step output defaulting rule is
   not a problem anymore, since all _named_ pipelines are defined with
   declare-step, where i/o has never been defaulted.

5) Change the definition of p:pipe so that 'step' is optional, and if
   omitted means the lexically inclosing p:pipeline.

I think this is actually a much cleaner design.  It puts all the load
of defining typed steps on declare-step.  It will actually make moving
- From user-defined to implementation-defined a very smooth transition
- -- we could even say that the content of a p:declare-step is actually
a fallback -- implementations can supply builtin-definitions if they
have them.

I don't think Norm's other objection to my original 'require I/O decls
only on libraries' proposal is as strong now -- to publish a pipeline,
I just rename it to p:declare-step and wrap it in a
p:pipeline-library, probably adding I/O declarations as well.  No
internal changes are required.

This also simplifies things in that p:import now _only_ imports
libraries.  It will of course still be possible for an implementation
to 'run' a library, defaulting to the first or only p:declare-step in
the absence of a type argument.

We get the defaulting of p:pipeline I/O back, because every step has
mandatory I/O declarations, so there's no problem answering the "does
the last step have a declaration?" question.

Another way of looking at this is that p:pipeline is still special,
it's just not a step anymore -- it's just a way of packaging a
subpipeline so you can run it.

I realise this will be a bit of a concept earthquake for those of us
who have lived with the existing design for some time, but I think for
new users it will appear perfectly natural.

Here's what Ex 2 and the example pipeline library would now look like:

<p:pipeline xmlns:p="http://www.w3.org/ns/xproc">
  <p:input port="schemas" sequence="true"/>

  <p:xinclude/>

  <p:validate-with-xml-schema>
    <p:input port="schema">
      <p:pipe port="schemas"/>
    </p:input>
  </p:validate-with-xml-schema>

</p:pipeline>

<p:pipeline-library xmlns:p="http://www.w3.org/ns/xproc"
                    xmlns:my="http://example.com/ns/pipelines">

<p:import href="ancillary-library.xml"/>
<p:import href="other-pipeline.xml"/>

<p:declare-step type="my:validate">
  <!-- validate declarations and subpipeline -->
</p:declare-step>

<p:declare-step type="myv:format"
            xmlns:myv="http://example.com/vanity/mine">
  <!-- format declarations and subpipeline -->
</p:declare-step>

</p:declare-step-library>

One final note -- (1) above arguably oversimplifies, and so makes my
assertion about how you publish a pipeline false, because as written
it doesn't allow some things that p:pipeline does in its content.  I
think on balance I'd prefer to be catholic about that, and so we should
rather have

<p:declare-step
  type = QName>
    (p:input |
     p:output |
     p:import |
     p:declare-step |
     p:serialization |
     p:option)*
     _subpipeline_
</p:declare-step>

with the stipulation that a) nested declarations and imports are _not_
visible higher up; b) serialization is only relevant if the step is
'run'.  Which points out another collateral benefit -- p:serialization
really belongs on p:declare-step, because its perfectly coherent to
ask an implementation to 'run' a step from a pipeline library w/o
knowing or caring whether it's declared built-in or user-defined,
that is, without or with an explicit subpipeline.

Thanks for listening :-)

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHUCVQkjnJixAXWBoRAvmCAJwKcYZRZSAeSqaWZkit0mqXx5gHmQCfXr7e
9zLCQES2uqnRWd02mkFe9p4=
=Fwe7
-----END PGP SIGNATURE-----
Received on Friday, 30 November 2007 14:59:41 UTC