Re: Composability from Henry S. Thompson on 2007-06-08 (public-xml-processing-model-wg@w3.org from June 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Fri, 08 Jun 2007 12:25:37 +0100
To: Jeni Tennison <jeni@jenitennison.com>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <f5b1wgm8z7y.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jeni Tennison writes:

> Henry S. Thompson wrote:
>> Jeni Tennison writes:
>>
>>> [Namespaces]
>>>
>>> The situation that will trip you up, I think, is where the option's
>>> value is set based on the value of some attribute or element in a
>>> separate XML document.
>>>
>>> [An increasingly tricky set of examples]
>>> I just don't see a way that we can specify how a processor should do
>>> this: you need the intelligence of the person writing the pipeline to
>>> be able to tell which namespaces need to be used, which means we need:
>> Agreed.
>>
>>>   (a) a way of passing in namespace bindings to either a step (my
>>> preference) or individual options (which I think is too complicated).
>> Agreed -- that's what the Markup pipeline language does.  If you
>> supply an option whose value is/includes prefixed names, you must
>> accompany it with, as it were, <p:bind prefix="..." namespace="..."/>
>>
>>>   (b) a way of picking up the namespaces that were in scope when a
>>> pipeline was invoked
>> I'm not sure I understand, but maybe this means that if you can
>> specify option name/value pairs at invociation (e.g. on the command
>> line), you need to be able to specify prefix bindings as well, in
>> which case I agree.
>
> Yes, you need to be able to set namespace bindings on the command line
> (this is what (a) is about: passing in namespace bindings when a step
> is invoked).

Well, I think the cases are different -- for (a), as I said above,
when a pipeline _author_ writes an option value that uses a
prefixed-QName, s/he must also write an NS binding (in the pipeline
language) for that prefix.

It's a separate (or separable) case when a _user_ specifies a parameter
value externally -- they also need a mechanism to specify NS bindings
for any prefixed-QNames they've used.

I _think_ we're in violent agreement about these two points.

> But if you're within the invoked pipeline, you need a way of knowing
> what those namespace bindings were and of passing them on through to
> any steps within the pipeline. Here's an example.
>
> On my command line, I do:
>
>> myproc mypipe.xpl -in source="*.xml"
>                     -opt test="/e:foo/e:bar = 'baz'"
>                     -ns e="http://www.example.com/ns/example"
>
> My pipeline looks like:
>
> <p:pipeline xmlns:p="...">
>   <p:input port="source" sequence="yes" />
>   <p:output port="result" sequence="yes" />
>   <p:option name="test" required="yes" />
>   <p:split-sequence>
>     <p:option name="test" select="$test" />
>   </p:split-sequence>
> </p:pipeline>
>
> The namespace binding for the 'e' prefix gets passed into my
> pipeline. Fine. But what namespace bindings get passed into the
> <p:split-sequence> step? I think that the answer is that only the
> namespace bindings that are in-scope on the <p:split-sequence> element
> get passed to that step. The only namespace binding that's in scope
> within the pipeline is the XProc namespace. So there's no binding for
> the 'e' prefix, and the step fails.

That's where what Norm and I discussed before will do the job -- if
you have an option to a step (in this case the pipeline itself) which
is accompanied by explicit NS bindings, it carries those bindings with
it forever.

> Worse, if I have:
>
> <p:pipeline xmlns:p="..."
>             xmlns:e="http://www.example.com/ns/some/other/namespace">
>   <p:input port="source" sequence="yes" />
>   <p:output port="result" sequence="yes" />
>   <p:option name="test" required="yes" />
>   <p:split-sequence>
>     <p:option name="test" select="$test" />
>   </p:split-sequence>
>   ...
> </p:pipeline>
>
> then the 'e' prefix is bound OK, but to the wrong namespace! I won't
> even get an error, just fail to get the result I should.

See above -- on my account (and I think Norm's) you don't _ever_ get
the in-scope NS bindings from the pipeline doc't infoset, you only get
the explicit ones bound _using_ the pipeline language.

Note this has all been about XPaths evaluated _by a component_, and my
initial assumption is that for XPaths evaluated _by the engine_, the
in-scope bindings from the infoset _will_ be used.  I agree this will
cause some confusion, maybe the right answer is to always include the
in-scope implicit bindings along with the in-scope explicit bindings,
but give the latter precedence over the former. . .

> What I want is to take the in-scope namespaces from the invocation of
> the pipeline and pass *them* through to the steps in the pipeline.
>
> For example, analogously to parameter sets:
>
> <p:pipeline name="pipe"
>             xmlns:p="..."
>             xmlns:e="http://www.example.com/ns/some/other/namespace">
>   <p:input port="source" sequence="yes" />
>   <p:output port="result" sequence="yes" />
>   <p:option name="test" required="yes" />
>   <p:split-sequence>
>     <p:option name="test" select="$test" />
>     <p:namespace-bindings>
>       <p:pipe step="pipeline-namespaces" port="result" />
>     </p:namespace-bindings>
>   </p:split-sequence>
>   <p:namespaces name="pipeline-namespaces" />
>   ...
> </p:pipeline>

Much too much mechanism for a corner case of a corner case, in my view.

>>>   (c) a way of getting the in-scope namespace declarations within a
>>> pipeline, in order to use those in the set that you then pass into
>>> another step (this can be done with a <p:namespaces> step analogous
>>> to the <p:parameters> one).
>>
>> I don't see the need for this. . .
>
> You're right, as long as there's a way of binding individual
> namespaces explicitly (like your <p:bind> element). After all, if
> you're in the pipeline then you know already what namespaces are in
> scope and you can just copy the ones you need. For example:
>
>   <p:split-sequence>
>     <p:option name="test" select="concat('/f:wrap', $test)" />
>     <p:namespace-bindings>
>       <p:pipe step="pipeline-namespaces" port="result" />
>       <p:namespace prefix="f" uri="http://www.example.com/ns/f" />
>     </p:namespace-bindings>
>   </p:split-sequence>

I'd much prefer to just have

  <p:split-sequence>
    <p:option name="test" select="concat('/f:wrap', $test)" />
    <p:bind prefix="f" uri="http://www.example.com/ns/f" />
  </p:split-sequence>

with the rule for associating namespaces with option values being
roughly:

 The NS bindings associated with an option value are, in priority
 order:

  1) The explicit (p:bind) bindings among its siblings;
  2) The bindings associated with any option values used in its
     construction;
  3) The in-scope namespace bindings from the p:option EII itself.

Note that as far as I can see both your _and_ my approaches will do
'the wrong thing' in the following case:

   > myproc mypipe.xpl -in source="*.xml"
                    -opt test="/e:foo/e:bar = 'baz'"
                    -ns e="http://www.example.com/ns/e1"

      <p:split-sequence>
        <p:option name="test" select="concat('/e:wrap', $test)" />
        <p:bind prefix="f" uri="http://www.example.com/ns/e2" />
      </p:split-sequence>

_Caveat scriptor_.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFGaTyxkjnJixAXWBoRAtUUAJwLrUoi8f721QkzqOVRDJKBo+2zqQCfZK9Y
w6H/FuiADONq61iDDqSfQJg=
=FyKJ
-----END PGP SIGNATURE-----
Received on Friday, 8 June 2007 11:25:55 UTC