Re: "Feature complete" XProc draft from Henry S. Thompson on 2006-08-23 (public-xml-processing-model-wg@w3.org from August 2006)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 23 Aug 2006 15:57:40 +0100
To: Norman Walsh <Norman.Walsh@Sun.COM>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <f5bodubwlcr.fsf@erasmus.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Great work!

Comments follow, of varying degrees of seriousness. . .

Figure 2. A transform and serialize pipeline

  I _think_ this raises too many questions to come as only the second
  example.  I at least was quite baffled/worried at first, that a
  'Serialize' step was going to be necessary to get XML documents out
  of pipelines.  Then I realised you had included it because of the
  'all choose branches have same output port configuration'
  constraint, but we _really_ don't want to go in to that at this
  point in the doc't, do we?

  Why not use the test="/my:root/@version < 1.2" schema validate
  example at this point?

2.2 Inputs and Outputs

  I think we need to say here why it's _not_ a static error if an
  output is connected to an input of with a different declared
  cardinality, i.e. to explicitly explain that we decided it was ok to
  connect a sequence-out to a singlelton-in, and only complain if the
  sequence-out failed to produce exactly one document.  

2.3 Parameters

  How can a value other than a string "[be] given"?  Did we decide
  that parameters are specified with XPaths?  If so, surely that
  should be said here.

2.4 Component graph

  "The inputs and outputs . . . _are_ the arcs of that graph"
  [emphasis added]?  Surely, as in the immediately following
  definition, "... are connected by the arcs" is what is wanted?

3 Language Constructs

 I'd prefer to have "for-each construct", "viewport construct", etc.,
 rather than "for-each component".

3.1 Pipeline

 I find the first sentence pretty baffling. . .

3.2 For-Each

  Needs some brief motivation, I think, along the lines of

   "In cases where a component or sub-pipeline requires a single
   document input, but a pipeline needs to process a sequence of
   documents with that component, the for-each construct can be used."

  The term 'aggregation' is nowhere defined.  I think nothing is lost,
  and indeed we're better off, if the definition reads:

   The result of the for-each is a sequence of the documents produced
   by processing each individual document in the input sequence.  If
   the for-each subpipeline declares multiple outputs, each output is
   a sequence of the documents produced on that output by each
   iteration.

3.4 Choose

  Paras 1 and 5 seem to contradict each other wrt the presence of a
  default.

3.5 Try/Catch

  That word 'aggregation' again :-)

4 Syntax

  I'm OK with using 'instantiate' to describe the relationship between
  components and steps (although I'm still no sure about using
  'component' for both type and token throughout the first three
  sections), but I would much prefer to talk about 'representing' or
  'encoding' a pipeline. . .  Also in 4.2 Pipeline Vocabulary

4.1.1 Specified by URI

  Have we decided whether the schema type of the *href* attribute is
  xs:anyURI or (list of xs:anyURI)?  I _think_ I see no reason not to
  support the latter.  Makes the validate component much simpler -- I
  just write

   <p:input port="schema"
            href="http://www.example.com/myvocab
                  http://www.w3.org/2001/06/soap-envelope.xsd"/>

4.1.1 Specified by source

  The word 'ancestor' is not defined, or immediately obvious -- how
  about

    ". . . must either be declared on some ancestor (e.g. an enclosing
     _choose_ or _for-each_) or it must be. . ."

4.1.1 Specified by here document

  More than one (non-document) child == sequence allowed?

4.1.2 Editorial Note

  Well, we did have step is instantiation of component, in turn
  described by component declaration.

  We could have component for both type _and_ token, which is what you
  seemed to be going for in section 3, with p:component-declaration
  describing the type and p:component corresponding to an instance.

  But p:step is so nice and short . . .

4.1.3 Syntactic shortcuts

  Arghh!  Now we're calling choose a _user-defined_ component.  Surely
  not.  Stick with 'construct', please!

  [note here and elsewhere you haven't made up your mind wrt p:param
  vs. p:parameter -- I vote for p:(declare-)parameter, because we're
  going in the opposite direction from xslt, i.e. if we used p:param,
  we'd have the following confusing paradigm:

     p:declare-param is to p:param as xsl:param is to xsl:with-param]

4.2.1 p:pipeline Element

  [I'm only going to say this once :-]

  I'd much prefer 

    "A p:pipeline represents a _pipeline_.  Its children represent
    declarations of the inputs, outputs and parameters that the
    pipeline exposes and the _subpipeline_ that constitutes
    its definition."

4.2.8 p:for-each Element

  The term 'aggregate' is nowhere defined, and I find it a bit opaque
  at best and misleading at worst.  How about replacing the last _two_
  sentences before the example with

     For each declared output, the processor will collect all the
     documents that are produced for that output from all the
     iterations, in order, into a sequence.

[not done, but I'm sending this now and will add more later]

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.1 (GNU/Linux)

iD8DBQFE7GzlkjnJixAXWBoRAvj3AJ9DdrPiqzuKLbjnqiLC9WgejidnZwCeKXGM
KMFPeaXMHFAkP5eVMeRCnxg=
=xsyo
-----END PGP SIGNATURE-----
Received on Wednesday, 23 August 2006 14:58:52 UTC