Re: Making vanilla implementation of position() DTRT from Jeni Tennison on 2007-06-08 (public-xml-processing-model-wg@w3.org from June 2007)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Fri, 08 Jun 2007 20:13:12 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <4669AA48.20409@jenitennison.com>
Henry S. Thompson wrote:
> Jeni Tennison writes:
>> This is how I suggest we define it, so that it does work in the way
>> we'd expect it to:
>>
>> 1. We add "current document" and "current document sequence" to the
>> environment. These are set as follows:
>>
>>  (a) for <p:viewport> and <p:for-each>, the current document sequence
>> is the sequence of documents being processed by the viewport or
>> for-each, and the current document is the current document being
>> processed by the viewport or for-each (which is also bound to the
>> 'current' port).
> 
> Does this mean that viewport and for-each can't stream?  That seems a
> very high price to pay!  If they _can_ stream, how do they know what
> the document sequence is?

This is a notional/logical/functional description of how the processor 
works. It's certainly not intended to force a particular implementation 
strategy: you can stream, you can run in parallel, you can do whatever 
the hell you like, as long as the end result is the same.

The for-each/viewport knows what its document sequence is because that's 
what's been bound to its input, in just the same way as a step knows 
what its document sequence is because that's what's been bound to its 
input. There's no requirement in either case to pass the whole sequence 
all in one go.

The only thing that would stop a processor streaming is if (a) we decide 
that the context position should equal the size of the current document 
sequence and (b) a processor detects the use of last() in this context. 
(a) seems increasingly unlikely.

>>  (b) for <p:pipeline>, the current document and current document
>> sequence are undefined.
>>
>>  (c) for all other steps, the current document and current document
>> sequence are the same as those of its container step. (For example, a
>> group or choose doesn't change the current document or current
>> document sequence.)
>>
>> NOTE that neither the current document sequence nor the current
>> document is the same as the default readable port: within the scope of
>> a for-each, the default readable port changes between steps, but the
>> current document sequence does not.
>>
>> 2. XPath expressions in the context of a pipeline (i.e. those that
>> aren't passed as options to steps) are evaluated differently depending
>> on what source they use for their context (as set by
>> <p:xpath-context>, the <p:pipe> within an option or parameter and so
>> on):
>>
>>   if it's set to the current port of a for-each or viewport (either
>>   explicitly or implicitly [when the default readable port is the
>>   'current' port]), then:
>>
>>      * context node = the current document
>>      * context position = the position of the current document in the
>>        current document sequence
> 
> This is virtually identical to my proposal -- note that as I said,
> this means in practice it will only work for the first step in the
> subpipelines, and then only if it reads the DRP.

I don't believe that it is the same as your proposal, precisely because 
it *will* work for all the steps in the subpipeline, even within nested 
groups and chooses. The current document and current document sequence 
stay the same while the DRP changes. This makes it work the way I 
believe people will expect from exposure to XSLT, namely that you can do 
things like:

   <p:viewport match="xhtml:body/xhtml:div">
     <p:choose>
       <p:when test="position() &lt; 10">
         <p:output port="result" />
         <p:identity />
       </p:when>
       <p:otherwise>
         <p:output port="result" />
         <p:xslt>
           <p:input port="stylesheet">
             <p:document href="collapse.xsl" />
           </p:input>
         </p:xslt>
       </p:otherwise>
     </p:choose>
   </p:viewport>

> Net-net -- your proposal only uses the
> context:'current-document-sequence' on the 'get it right' option wrt
> last(), _when_ last() is evaluated _by the engine_ on the first step
> when that evaluation is tied to the DRP.  It's not worth it.

I used the concept of a current document and current document sequence 
because those are analogous to the "current node" and "current node 
list" used to describe the setting of the context position (and context 
size) in XSLT. I find describing it in functional terms, like this, 
easier to understand (and more flexible wrt implementation strategies) 
than a procedural explanation along the lines of "the for-each sets the 
position to 1 and then increments it for each document encountered", but 
I might well be on my own.

> Once you give up on context:'current-document-sequence' you analysis
> is identical to mine, only the question of whether position() should
> have a special meaning in a very limited and hard to delimit set of
> cases.  Again, in my view it's just not worth it, and will require
> people to use a p:group and an option to bind to position() right at
> the beginning almost all the time.  I _much_ prefer to just say that
> for-each and viewport bind p:index to the iteration number, end of
> story.

OK, here's a possible compromise:

1. Add the concept of the 'iteration number' to the environment. This is 
set to 1 in a <p:pipeline>, the position of the document being processed 
  in a <p:for-each> or <p:viewport>, and to the iteration number in the 
context of the container step in other cases (such as <p:group> and 
<p:choose>). (It is *NOT* reset when the DRP changes, or when you move 
inside another container step, unless it's another <p:for-each> or 
<p:viewport>.)

2. The iteration number is always exposed through the p:index() function 
in XPaths evaluated by the XProc engine, so if you do:

   <p:for-each>
     <p:option name="href"
       select="concat(/my:config/my:baseFilename, p:index())">
       <p:pipe step="pipe" port="config" />
     </p:option>
     ...
   </p:for-each>

then you'll get something reasonable.

3. The iteration number is *also* exposed as the context position when 
the context node for an XPath is the same as the current document in the 
iteration (exposed on the 'current' input); otherwise the context 
position is set to 1.

4. The context size is set to MAX_INT or iteration number plus one or 
iteration number; I don't particularly care. (If it's not going to be 
anything meaningful then it doesn't really matter what it is.)

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Friday, 8 June 2007 19:13:17 UTC