- From: Jeni Tennison <jeni@jenitennison.com>
- Date: Wed, 24 May 2006 21:38:54 +0100
- To: public-xml-processing-model-wg@w3.org
Hi Norm,
Norm Walsh wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | Norm Walsh wrote:
> |> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> |> | A more flexible alternative would be to say that labelled documents are
> |> | referencable as variables within the XPath expressions used to set parameters
> |> | or variables.
> |>
> |> Yes, but it puts variables/parameters and input/output labels all into
> |> the same "symbol space" which worries me a bit.
> |
> | It doesn't worry me. I think we want parameters and I/O labels to be in
> | the same symbol space anyway so that we can support a directed syntax
> | should we want to in the future.
>
> I don't see the connection there...
If parameter/variable names are in a different symbol space from
inputs/outputs then we are allowing a situation where a parameter can
have the same name as an input. For example:
<p:step name="my:process">
<p:input name="document" href="doc.xml" />
<p:param name="document" value="yes" />
<p:output name="result" label="out" />
</p:step>
If you translated this to a directed syntax, you'd run into problems
because an attribute or element called 'document' could refer to either
the input or the parameter. You obviously can't have:
<p:process document="doc.xml" document="yes" result="out" />
so you'd have to have something like:
<p:process input.document="doc.xml" param.document="yes"
output.result="out" />
I think that, in order to keep our options open for using a directed
syntax in the future, and to enable users to easily create their own
directed syntax that they can easily translate into our generic syntax,
the names of parameters and inputs/outputs should share a symbol space.
> |> | This is more flexible because it means that you can refer to more than one
> |> | document within the XPath expression.
> |>
> |> Indeed. Is that valuable enough to justify the added complexity?
> |
> | I think it's simpler. The explanation goes:
> |
> | The select attribute of p:param, p:variable, p:step/p:input and
> | p:pipeline/p:output holds an XPath expression that provides the value
> | of the parameter, variable, input or output. The value of a parameter
> | must be a string; it is set to the string value of the result of
> | evaluating the XPath expression. The value of an input or output must
> | be a node set containing only root (document) nodes [1]; it is an
> | error if the XPath evaluates to anything else.
>
> I'm having a hard time getting my head around using select on
> p:pipeline/p:output, but I think I get it.
It's funny, I found it hard to get my head around p:pipeline/p:output
being referenced by the final p:step/p:output. As Richard pointed out
some time ago, the links between the ports could be represented in
either direction. I think it makes more sense for *all* the step outputs
to be referenced (thus having the pipeline output doing the referencing)
rather than having some step outputs being referenced and some doing the
referencing. I think it makes it easier to add steps at the end of the
pipeline, and to create pipelines where an output is both a final output
and an intermediate output.
> | When evaluating an XPath expression, the context node and the context
> | position are undefined: it is an error if the expression references
> | them [2]. The variable bindings for the expression are determined by
> | variable binding elements that precede the expression. These are:
> |
> | - p:pipeline/p:input binds the variable with the name specified in the
> | name attribute to a node set containing the root (document) nodes
> | passed as that input.
> |
> | - p:pipeline/p:param binds the variable with the name specified in the
> | name attribute to the (string) value passed as the value of the
> | parameter, or to the string value of the result of evaluating the
> | XPath in the select attribute if no value is passed for the
> | parameter.
>
> Does anyone have any misgivings about requiring that parameters be strings?
> Specifically, that they may not be documents?
To be honest, the distinction between inputs and parameters has always
seemed a bit weird to me: they're both pieces of information passed to
the component. At the moment, the only distinction between them seems to
be the type of value they can take (inputs are sequences of documents,
parameters are strings), and I can live with that, or with parameters
being able to take other atomic values as well.
If parameters could be documents then I'd be left wondering what the
difference was between an input document and a parameter document? Are
there additional restrictions, such as parameter documents being static
(not generated by the pipeline)? Or perhaps parameters can be left unset
(and have a default) whereas inputs can't?
> | [2] I think we'll want to set the context node and context position
> | differently within a <p:for-each>.
>
> Maybe, but that's not obvious to me. I had in mind that for-each would
> bind an input to the first document in its input sequence and run the
> steps it contains with that input. Then it would bind the input to the
> second document in its input sequence and run the steps again. I didn't
> expect to have XPath expressions referring directly to the current
> input document inside for-each any more than elsewhere in the
> pipeline.
>
> I had in mind something like this:
>
> <p:step name="xslt">
> ...
> <p:output name="result" label="styled-docs"/>
> </p:step>
>
> <p:for-each ref="styled-docs">
> <p:input name="document" label="doc"/>
> <p:output name="result" label="result"/>
>
> <p:step name="tidy">
> <p:input name="document" ref="doc"/>
> <p:output name="result" label="result"/>
> </p:step>
> </p:for-each>
I'd be happy with that too, although I do think that having to declare
the inputs/outputs of <p:for-each> (and the other flow-control elements)
is a bit tedious.
Presumably with the above syntax it's always the first input that gets
bound to the individual documents in the selected document sequence? And
presumably the outputs get set to the concatenation of the outputs they
reference?
Cheers,
Jeni
--
Jeni Tennison
http://www.jenitennison.com
Received on Wednesday, 24 May 2006 20:39:12 UTC