Re: Composability from Jeni Tennison on 2007-06-08 (public-xml-processing-model-wg@w3.org from June 2007)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 08 Jun 2007 22:09:20 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <4669C580.7080105@jenitennison.com>
Henry S. Thompson wrote:
> Jeni Tennison writes:
>> Yes, you need to be able to set namespace bindings on the command line
>> (this is what (a) is about: passing in namespace bindings when a step
>> is invoked).
> 
> Well, I think the cases are different -- for (a), as I said above,
> when a pipeline _author_ writes an option value that uses a
> prefixed-QName, s/he must also write an NS binding (in the pipeline
> language) for that prefix.

Yes, the pipeline author needs to do that. We don't have a mechanism to 
do it in XProc explicitly: the in-scope namespaces always get used.

> It's a separate (or separable) case when a _user_ specifies a parameter
> value externally -- they also need a mechanism to specify NS bindings
> for any prefixed-QNames they've used.

I think of a user invoking a pipeline from the command line and a 
pipeline author invoking another pipeline from within their pipeline as 
being the same thing. The command line is like a one-off pipeline with 
no inputs or outputs. To me, defining options on the command line is 
just the same as using <p:option> in a step. You probably *don't* think 
of it in that way, which might be a source of some confusion on both sides.

> I _think_ we're in violent agreement about these two points.

Yes, we seem to agree that we need a way of supplying namespace bindings 
  alongside an option value. (In fact I'd be happy simply to supply 
namespace bindings for entire steps rather than individual options, but 
let's not be picky.)

>> But if you're within the invoked pipeline, you need a way of knowing
>> what those namespace bindings were and of passing them on through to
>> any steps within the pipeline. Here's an example.
>>
>> On my command line, I do:
>>
>>> myproc mypipe.xpl -in source="*.xml"
>>                     -opt test="/e:foo/e:bar = 'baz'"
>>                     -ns e="http://www.example.com/ns/example"
>>
>> My pipeline looks like:
>>
>> <p:pipeline xmlns:p="...">
>>   <p:input port="source" sequence="yes" />
>>   <p:output port="result" sequence="yes" />
>>   <p:option name="test" required="yes" />
>>   <p:split-sequence>
>>     <p:option name="test" select="$test" />
>>   </p:split-sequence>
>> </p:pipeline>
>>
>> The namespace binding for the 'e' prefix gets passed into my
>> pipeline. Fine. But what namespace bindings get passed into the
>> <p:split-sequence> step? I think that the answer is that only the
>> namespace bindings that are in-scope on the <p:split-sequence> element
>> get passed to that step. The only namespace binding that's in scope
>> within the pipeline is the XProc namespace. So there's no binding for
>> the 'e' prefix, and the step fails.
> 
> That's where what Norm and I discussed before will do the job -- if
> you have an option to a step (in this case the pipeline itself) which
> is accompanied by explicit NS bindings, it carries those bindings with
> it forever.

Okaay, but I don't have a clear idea about how that would work. Do you 
mean wording like "if an option is set by a single variable reference 
then the namespace bindings from the referenced option are used for this 
  option"?

>> Worse, if I have:
>>
>> <p:pipeline xmlns:p="..."
>>             xmlns:e="http://www.example.com/ns/some/other/namespace">
>>   <p:input port="source" sequence="yes" />
>>   <p:output port="result" sequence="yes" />
>>   <p:option name="test" required="yes" />
>>   <p:split-sequence>
>>     <p:option name="test" select="$test" />
>>   </p:split-sequence>
>>   ...
>> </p:pipeline>
>>
>> then the 'e' prefix is bound OK, but to the wrong namespace! I won't
>> even get an error, just fail to get the result I should.
> 
> See above -- on my account (and I think Norm's) you don't _ever_ get
> the in-scope NS bindings from the pipeline doc't infoset, you only get
> the explicit ones bound _using_ the pipeline language.
 >
> Note this has all been about XPaths evaluated _by a component_, and my
> initial assumption is that for XPaths evaluated _by the engine_, the
> in-scope bindings from the infoset _will_ be used.  I agree this will
> cause some confusion, maybe the right answer is to always include the
> in-scope implicit bindings along with the in-scope explicit bindings,
> but give the latter precedence over the former. . .

Right. For usability, if you do:

   <p:option name="test"
     value="not(/xhtml:html/xhtml:head/xhtml:title)" />

the 'xhtml' binding should come from the in-scope namespaces on the 
pipeline. It's too much work to have to re-type namespace bindings 
(especially with full URIs) all over the place. I think this works for 
cases where the value attribute is used, or the select attribute is used 
with a single variable reference.

>> What I want is to take the in-scope namespaces from the invocation of
>> the pipeline and pass *them* through to the steps in the pipeline.
>>
>> For example, analogously to parameter sets:
>>
>> <p:pipeline name="pipe"
>>             xmlns:p="..."
>>             xmlns:e="http://www.example.com/ns/some/other/namespace">
>>   <p:input port="source" sequence="yes" />
>>   <p:output port="result" sequence="yes" />
>>   <p:option name="test" required="yes" />
>>   <p:split-sequence>
>>     <p:option name="test" select="$test" />
>>     <p:namespace-bindings>
>>       <p:pipe step="pipeline-namespaces" port="result" />
>>     </p:namespace-bindings>
>>   </p:split-sequence>
>>   <p:namespaces name="pipeline-namespaces" />
>>   ...
>> </p:pipeline>
> 
> Much too much mechanism for a corner case of a corner case, in my view.

I should have written:

<p:pipeline>
   <p:input port="source" sequence="yes" />
   <p:output port="result" sequence="yes" />
   <p:option name="test" required="yes" />
   <p:split-sequence>
     <p:option name="test" select="$test" />
     <p:use-namespace-bindings name="#pipeline-namespaces" />
   </p:split-sequence>
   ...
</p:pipeline>

I was an attempting an analogy with our handling of parameter sets.

>>>>   (c) a way of getting the in-scope namespace declarations within a
>>>> pipeline, in order to use those in the set that you then pass into
>>>> another step (this can be done with a <p:namespaces> step analogous
>>>> to the <p:parameters> one).
>>> I don't see the need for this. . .
>> You're right, as long as there's a way of binding individual
>> namespaces explicitly (like your <p:bind> element). After all, if
>> you're in the pipeline then you know already what namespaces are in
>> scope and you can just copy the ones you need. For example:
>>
>>   <p:split-sequence>
>>     <p:option name="test" select="concat('/f:wrap', $test)" />
>>     <p:namespace-bindings>
>>       <p:pipe step="pipeline-namespaces" port="result" />
>>       <p:namespace prefix="f" uri="http://www.example.com/ns/f" />
>>     </p:namespace-bindings>
>>   </p:split-sequence>
> 
> I'd much prefer to just have
> 
>   <p:split-sequence>
>     <p:option name="test" select="concat('/f:wrap', $test)" />
>     <p:bind prefix="f" uri="http://www.example.com/ns/f" />
>   </p:split-sequence>

I'm more-or-less OK with that but would prefer we call the 
namespace-binding element <p:namespace> or something similar: <p:bind> 
is too generic given that we also "bind" outputs to inputs and so on.

I also note that this seems to be associating the namespaces with the 
*step* rather than the *option*. If you want option-specific namespaces 
then it should be:

   <p:split-sequence>
     <p:option name="test" select="concat('/f:wrap', $test)">
       <p:bind prefix="f" uri="http://www.example.com/ns/f" />
     </p:option>
   </p:split-sequence>

right?

> with the rule for associating namespaces with option values being
> roughly:
> 
>  The NS bindings associated with an option value are, in priority
>  order:
> 
>   1) The explicit (p:bind) bindings among its siblings;
>   2) The bindings associated with any option values used in its
>      construction;
>   3) The in-scope namespace bindings from the p:option EII itself.

I'm a bit wary of that level of automated fishing for namespace 
bindings, but I assume a dynamic error if the bindings associated with 
the various option values clash? (Just curious: what makes you OK with 
looking inside XPath expressions for variable references here, but not 
to see if last() has been used?)

I still don't see a way to get the right namespace bindings if the 
option is set from an external configuration document, and this is a 
really important use case for me. I don't think you can reasonably 
analyse the XPath to identify which nodes have been referenced in the 
general case, but you could possibly see whether the select expression 
selects a node, and use the in-scope namespaces for that node in that 
case, so that:

   <p:option name="test" select="/my:config/my:filter/@test" />

works?

In the general case, a <p:use-namespaces> or something could do it:

   <p:option name="test"
     select="concat('d:row[', /my:config/my:filter/@test, ']')">
     <p:pipe step="pipe" source="config" />
     <p:use-namespaces select="/my:config/my:filter" />
     <p:namespace prefix="d" uri="http://www.example.com/ns/data" />
   </p:option>

if you didn't want to go the full reification route.

> Note that as far as I can see both your _and_ my approaches will do
> 'the wrong thing' in the following case:
> 
>    > myproc mypipe.xpl -in source="*.xml"
>                     -opt test="/e:foo/e:bar = 'baz'"
>                     -ns e="http://www.example.com/ns/e1"
> 
>       <p:split-sequence>
>         <p:option name="test" select="concat('/e:wrap', $test)" />
>         <p:bind prefix="f" uri="http://www.example.com/ns/e2" />
>       </p:split-sequence>
> 
> _Caveat scriptor_.

Yes.

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Friday, 8 June 2007 21:09:21 UTC