Re: Variables and parameters from Jeni Tennison on 2006-05-24 (public-xml-processing-model-wg@w3.org from May 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 24 May 2006 21:38:54 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <4474C45E.2090104@jenitennison.com>
Hi Norm,

Norm Walsh wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | Norm Walsh wrote:
> |> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> |> | A more flexible alternative would be to say that labelled documents are
> |> | referencable as variables within the XPath expressions used to set parameters
> |> | or variables.
> |>
> |> Yes, but it puts variables/parameters and input/output labels all into
> |> the same "symbol space" which worries me a bit.
> |
> | It doesn't worry me. I think we want parameters and I/O labels to be in
> | the same symbol space anyway so that we can support a directed syntax
> | should we want to in the future.
> 
> I don't see the connection there...

If parameter/variable names are in a different symbol space from 
inputs/outputs then we are allowing a situation where a parameter can 
have the same name as an input. For example:

   <p:step name="my:process">
     <p:input name="document" href="doc.xml" />
     <p:param name="document" value="yes" />
     <p:output name="result" label="out" />
   </p:step>

If you translated this to a directed syntax, you'd run into problems 
because an attribute or element called 'document' could refer to either 
the input or the parameter. You obviously can't have:

   <p:process document="doc.xml" document="yes" result="out" />

so you'd have to have something like:

   <p:process input.document="doc.xml" param.document="yes"
              output.result="out" />

I think that, in order to keep our options open for using a directed 
syntax in the future, and to enable users to easily create their own 
directed syntax that they can easily translate into our generic syntax, 
the names of parameters and inputs/outputs should share a symbol space.

> |> | This is more flexible because it means that you can refer to more than one
> |> | document within the XPath expression.
> |>
> |> Indeed. Is that valuable enough to justify the added complexity?
> |
> | I think it's simpler. The explanation goes:
> |
> |   The select attribute of p:param, p:variable, p:step/p:input and
> |   p:pipeline/p:output holds an XPath expression that provides the value
> |   of the parameter, variable, input or output. The value of a parameter
> |   must be a string; it is set to the string value of the result of
> |   evaluating the XPath expression. The value of an input or output must
> |   be a node set containing only root (document) nodes [1]; it is an
> |   error if the XPath evaluates to anything else.
> 
> I'm having a hard time getting my head around using select on
> p:pipeline/p:output, but I think I get it.

It's funny, I found it hard to get my head around p:pipeline/p:output 
being referenced by the final p:step/p:output. As Richard pointed out 
some time ago, the links between the ports could be represented in 
either direction. I think it makes more sense for *all* the step outputs 
to be referenced (thus having the pipeline output doing the referencing) 
rather than having some step outputs being referenced and some doing the 
referencing. I think it makes it easier to add steps at the end of the 
pipeline, and to create pipelines where an output is both a final output 
and an intermediate output.

> |   When evaluating an XPath expression, the context node and the context
> |   position are undefined: it is an error if the expression references
> |   them [2]. The variable bindings for the expression are determined by
> |   variable binding elements that precede the expression. These are:
> |
> |   - p:pipeline/p:input binds the variable with the name specified in the
> |     name attribute to a node set containing the root (document) nodes
> |     passed as that input.
> |
> |   - p:pipeline/p:param binds the variable with the name specified in the
> |     name attribute to the (string) value passed as the value of the
> |     parameter, or to the string value of the result of evaluating the
> |     XPath in the select attribute if no value is passed for the
> |     parameter.
> 
> Does anyone have any misgivings about requiring that parameters be strings?
> Specifically, that they may not be documents?

To be honest, the distinction between inputs and parameters has always 
seemed a bit weird to me: they're both pieces of information passed to 
the component. At the moment, the only distinction between them seems to 
be the type of value they can take (inputs are sequences of documents, 
parameters are strings), and I can live with that, or with parameters 
being able to take other atomic values as well.

If parameters could be documents then I'd be left wondering what the 
difference was between an input document and a parameter document? Are 
there additional restrictions, such as parameter documents being static 
(not generated by the pipeline)? Or perhaps parameters can be left unset 
(and have a default) whereas inputs can't?

> | [2] I think we'll want to set the context node and context position
> | differently within a <p:for-each>.
> 
> Maybe, but that's not obvious to me. I had in mind that for-each would
> bind an input to the first document in its input sequence and run the
> steps it contains with that input. Then it would bind the input to the
> second document in its input sequence and run the steps again. I didn't
> expect to have XPath expressions referring directly to the current
> input document inside for-each any more than elsewhere in the
> pipeline.
> 
> I had in mind something like this:
> 
>   <p:step name="xslt">
>      ...
>      <p:output name="result" label="styled-docs"/>
>   </p:step>
> 
>   <p:for-each ref="styled-docs">
>     <p:input name="document" label="doc"/>
>     <p:output name="result" label="result"/>
> 
>     <p:step name="tidy">
>       <p:input name="document" ref="doc"/>
>       <p:output name="result" label="result"/>
>     </p:step>
>   </p:for-each>

I'd be happy with that too, although I do think that having to declare 
the inputs/outputs of <p:for-each> (and the other flow-control elements) 
is a bit tedious.

Presumably with the above syntax it's always the first input that gets 
bound to the individual documents in the selected document sequence? And 
presumably the outputs get set to the concatenation of the outputs they 
reference?

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Wednesday, 24 May 2006 20:39:12 UTC