Re: New static error: options in the XProc namespace from Jeni Tennison on 2007-05-14 (public-xml-processing-model-wg@w3.org from May 2007)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Mon, 14 May 2007 19:30:52 +0200
To: public-xml-processing-model-wg@w3.org
Message-ID: <46489CCC.2040401@jenitennison.com>
Norman Walsh wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | (1) I don't understand the difference between the context position
> | (exposed through position()), $p:position and $p:stepname_index. When
> | does $p:stepname_index differ from $p:position?
> 
> Does this example help?
> 
>   http://lists.w3.org/Archives/Public/public-xml-processing-model-wg/2007May/0156.html

I had looked at that, and didn't understand that before I didn't 
understand the wording in the spec. I also note that your explanation 
seems to be different from Mohamed's, which makes me think that there's 
a certain amount of confusion over these variables.

> In brief, what's exposed to the steps inside a for-each or viewport
> isn't a sequence. Instead, the for-each feeds each each document in a
> sequence that *it* receives to the inner steps, one document at a time.
> 
> So (directly) inside a for-each, the $p:position is always 1. There's
> never a "second" document because it's not a sequence.

If $p:position is always 1, why have $p:position?

> Compare that with $p:stepname_index which counts the interations of
> the for-each or viewport loop.

If you reference the index of a for-each nested within a for-each, does 
$p:stepname_index give the total number of iterations, or does it reset 
to 1 when the outer for-each moves on to its next document?

> | When couldn't the
> | value held in $p:position be more readily exposed through the
> | position() function?
> 
> Because XPath 1.0 doesn't have the concept of a sequence. In XPath,
> position() is a position in a nodeset and we don't have a nodeset.
> We have a sequence of documents.
> 
> I suppose we *could* use position() but I think it would be wrong.

I strongly disagree. XPath has the concept of an environment which 
consists of (a) a context node (which can be of any type, including a 
document node), (b) a context position (which can be any integer greater 
than or equal to 1) and (c) a context size (which can be any integer 
greater than or equal to 1). This set of context information defines, by 
extension, a context sequence, which is the (ordered) sequence that is 
processed by a for-each or apply-templates. (Note that although XPath 
has no concept of a sequence, nevertheless for-each and apply-templates 
process a sequence because they process a node-set in an order defined 
either as document order or the order specified by a sort.)

I think it would be entirely appropriate to say that, within a for-each, 
the context node is the document node to be processed on this iteration, 
the context size is the number of document nodes to be processed by the 
for-each, and the context position is the position of the document node 
to be processed on this iteration amongst the sequence of document nodes 
to be processed by the for-each. A similar set of context information 
could be available within viewport.

> | (2) I do think that environment information should exposed through
> | functions rather than variables, because that's how it's done in XPath
> | and XSLT -- position(), last(), system-property(), current(),
> | current-group(), current-grouping-key(), and regex-group() are all
> | examples -- which will be familiar to our users. I also prefer using
> | functions because it seems better to use function arguments than
> | naming schemes (e.g. index(stepname) rather than $p:stepname_index).
> 
> I agree. You're the second person who would have voted for functions
> instead variables if you'd been on the call. I wonder if that's enough
> to turn the tide? (It would have been on the call.)
> 
> | Finally, I observe that, should we want to, we can define functions
> | that degrade elegantly when used out of context, whereas variable
> | references that refer to variables that don't exist will always raise
> | an error.
> 
> I worked around that editorially by making sure that these variables
> always have a value.

Doesn't that mean that implementations have to parse XPaths to know what 
variable bindings to supply to the XPath engines they're using?

> | (3) I think we need to take great care to distinguish between places
> | where XProc uses expressions or patterns (where I'd expect option
> | variables and environment functions or variables to be available) and
> | places where arbitrary steps use expressions or patterns (where I
> | wouldn't expect them to be available).
> |
> | For example, if I have:
> |
> | <p:group>
> |   <p:option name="myval" value="foo" />
> |   <p:xslt>
> |     <p:input port="stylesheet">
> <p:inline>
> |       <bar xsl:version="1.0">
> |         <xsl:value-of select="$myval" />
> |       </bar>
> </p:inline>
> |     </p:input>
> |   </p:xslt>
> | </p:group>
> |
> | would you expect the variable binding from the XProc environment to be
> | available in the XSLT environment?
> 
> No. From 3.3 in the spec:
> 
>   Inline documents are considered "quoted", they are not interpolated
>   or available to the pipeline processor in any way except as documents
>   flowing through the pipeline.

Yes, that explains why the $myval variable reference doesn't get 
replaced when the XSLT stylesheet is constructed, but the comparison I 
was making was with:

   <p:option name="replace" value="$myval" />

Again, option values are considered "quoted": if you want to set them 
dynamically you use the select attribute rather than the value attribute.

However, in both cases we have a decision to make about which variables 
are bound in the environment in which the stylesheet or XPath is evaluated.

> | I think we need some wording, in the definitions of those steps that
> | use expressions or patterns, that says what variables and functions
> | are available, as well as what the context node, position and length
> | are. If option variables and environment functions or variables aren't
> | automatically passed to these steps (and I don't think they should
> | be), we need to specify that these steps use parameters to provide
> | variable bindings for the step.
> |
> | In the latter case, I would write:
> |
> | <p:group>
> |   <p:option name="myval" value="foo" />
> |   <p:string-replace>
> |     <p:option name="match" value="@class[. = 'bar']" />
> |     <p:option name="replace" value="$myval" />
> |     <p:parameter name="myval" select="$myval" />
> |   </p:string-replace>
> | </p:group>
> 
> Ouch. That seems really ugly. I think I'd rather say that steps can ask
> the processor for the values of variables and functions. So the steps
> that accept patterns can be documented that way. The XSLT step has no
> knowledge of the pipeline processor so it naturally doesn't ask.

Let's pursue this in the other thread. I agree that it would be useful 
for expressions and patterns that are passed as option values to have 
access to the variable bindings from the XProc environment, but we need 
explicit wording in the definition of those steps where that is the case.

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Monday, 14 May 2007 17:31:04 UTC