Re: parameters and pipelines (revised)

Hi,

That looks pretty good to me. Just a few of comments:

First, I don't think we should be making any distinction between how 
pipelines are invoked "from outside" or "by name". In both cases, the 
pipeline needs to be provided with bindings for its inputs (parameter 
and non-parameter) and bindings for its options. In the case of 
invocation "from outside", it has to be implementation-defined exactly 
how that's done. It's useful to consider how implementations *might* do 
it, but we're not going to put that in the spec. (I'd hope 
implementations would try to do it in a similar way to how steps are 
invoked by XProc itself, rather than having different rules about what 
gets done with anonymous parameter ports, but we can't legislate that 
either way.)

Second, I think that giving <p:parameter> an optional port attribute 
would help. Specifically, it would mean that <p:parameter> could be used 
even if a pipeline accepted more than one parameter port. For example:

   <p:pipeline type="my:pipe">
     <p:input port="params1" kind="parameter" />
     <p:input port="params2" kind="parameter" />
     ...
   </p:pipeline>

   <my:pipe>
     <p:parameter port="params1" name="x" value="$x" />
     <p:parameter port="params2" name="y" value="$y" />
   </my:pipe>

would work. This would make life a lot easier for people who do need to 
use more than one parameter port, without making it any harder for 
people who don't: the port attribute can default to the name of the 
(only) parameter port (which might be anonymous) when there's only one.

Third, I wonder if we can place some sensible restrictions on when an 
anonymous parameter port is implicitly declared, along the same lines as 
for inputs and outputs. So a pipeline only has an implicit declaration 
for an anonymous parameter port added if (a) there are no other 
parameter port declarations and (b) one of its contained steps has a 
parameter port. This would have the advantage of better error reporting 
when a pipeline user mistakenly passes in parameters rather than options.

> Open questions:
> 
>  A) Should <p:input kind='parameter' .../> as a child of p:pipeline be
>     purely a declaration, i.e. be necessarily empty, or should we
>     allow it to have content, in which case how do we treat that
>     content -- merge it with external input, ignore it if there's any
>     external input, . . .?

I can see arguments all ways:

i) Pro empty: As the pipeline author, you're not supposed to know the 
names of parameters (this being what distinguishes them from options), 
so you (should) never be able to provide a sensible set of defaults. 
Therefore it should always be empty.

ii) Pro mass override: Parameter inputs should work in just the same way 
as ordinary inputs. Users should be able to provide a default that is 
entirely overridden if a binding is specified on invocation.

iii) Pro individual override: The most common situation is where you 
have a bunch of default values for parameters which should be overridden 
individually. It's excessively hard for invocations to specify all 
parameter values: they should be able to just supply values for 
parameters that don't take the default value.

I think I just about favour (i), with (iii) a close second, and I'm not 
so keen on (ii) but could live with it.

>  B) There's a covert assumption in the current spec., unchanged by any
>     of the above, that the API from the runtime to step
>     implementations will have a way of accessing parameters.  Since
>     parameters are declared, this access could take port name as an
>     argument, or it could just be undifferentiated as to port name,
>     that is, it's just "give me all the parameter bindings you have
>     for this instance of this step".  I don't suppose we _have_ to say
>     anything about this, but we could choose to say e.g. that
>     implementations _should_ provide access by port name, or at least
>     indicate what port particular parameter settings arrived via. . .
> 
>     As long as we allow more than one parameter port per step (and I
>     think we should), I have some inclination to encourage the
>     provision of access to them by port name.

Yes. The step needs to be provided with a set of name/value parameters 
for each parameter port, and therefore must specify which port they're 
supplying a particular set of name/value parameters to. I don't see any 
other way in which it would work.

>  C) Is the shadowing specified in (2c) above the right way around?  I
>     think it is, noting that if you _really_ want to override the
>     values coming 'from outside', you can do so on any step which
>     accesses the anonymous set.

(2c) says that <p:parameter> provides default values for parameters, 
which can be overridden by the anonymous set that's passed in to the 
pipeline. I think this is equivalent to saying that if there's no 
<p:input> for the parameter port then it gets added at the end of the 
step invocation (after, and therefore overriding, any <p:parameter> 
elements).

I did think it should be the other way around, but I've argued myself 
round to thinking it doesn't make much difference, given that the effort 
involved in changing the behaviour is small (adding an empty <p:input>) 
in most cases.

I note, however, that changing this behaviour in the case where the 
invoked step actually accepts an anonymous parameter set *is* hard, just 
as is passing inputs to an anonymous input port. Pipelines that expect 
parameters, and expect to be invoked from within another pipeline, 
should declare all their ports explicitly. But that's just a matter of 
best practice.

> Finally question (3) above becomes, in the terms of the proposal in
> (1) and (2) above, "Under what circumstances should the runtime
> deliver the anonymous parameter set when a step implementation asks
> for its parameters?"
> 
>     Possible answers:
>       a) Always;
>       b) Only if a parameter port has been explicitly bound to it (as
>          in the final example under (1) above) (and that port is asked
>          for);
>       c) If a parameter port has been explicitly bound to it (and is
>          asked for), or if some parameter ports have not been bound at
>          all (and all parameters are asked for? and any unbound
>          parameter port is asked for? and there is only one parameter
>          port declared and unbound?).

If I can turn the language round, I think the question is "when should 
the anonymous parameter set passed to a pipeline get passed on to a 
contained step in that pipeline?" Certainly it should when it is 
explicitly bound to a parameter port with an empty <p:input>.

Can we use the notion of a primary parameter port, just as we have a 
primary standard port? If there's only one parameter port, that's the 
primary one; otherwise one of the parameter ports can be marked with 
primary="yes". Then we can say that the anonymous parameter set gets 
passed in to the primary parameter port. That would, at least, mirror 
the way normal inputs work.

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Monday, 9 July 2007 08:27:30 UTC