W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > June 2007

Making vanilla implementation of position() DTRT (was Re: The semantics of position() -- trying to be very explicit)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Thu, 07 Jun 2007 16:03:12 +0100
To: Norman Walsh <ndw@nwalsh.com>
Cc: public-xml-processing-model-wg@w3.org
Message-ID: <f5b4pljajtb.fsf_-_@hildegard.inf.ed.ac.uk>

Hash: SHA1

Norman Walsh writes:

> One thing that struck me in reading your message was that you said:
>   I find it easiest to think of this in terms of position having a
>   sort-of special slot in the environment.
> In fact, the special slot is in the XPath context and we renamed the
> bag of state that we're carring around from "context" to "environment"
> precisely so that it wouldn't introduce confusion when we had to talk
> about setting up the XPath context :-)

Well, I guess what I'm saying is that we need to consider the XPath
context, that is, {context node, context position, context size} as
intimately connected with the environment (since it's from the
environment that provides variable bindings).  So what's really
required for our account of position to work is

  R1') Components which evaluate XPath expressions MUST

       a) For each XPath they evaluate with respect to a document,
          identify the input port with respect to which they evaluate

       b) Provide the following as the XPath context:
           Context node: the document
           Context position: the position of the document within the
           input sequence (always 1 for ports which don't accept
           Context size: 1 for ports which don't accept sequences, TBD
           for ports which do
           Variable bindings: (at least) the in-scope options from
           their environment

This is simple to explain, to implement and to use (although it won't
be used much, I predict).

  R2') Compound components which iterate one or more subpipelines with
       respect to some document sequence MUST provide the following as
       the XPath context associated with the environment inherited by
       their subpipeline

       Context node: the current document from the iteration
       Context position: the position of the document within the
           iteration sequence
       Context size: TBD

On the other hand I'm getting less and less happy with this.  Let's
dig a little deeper.  Where does R2' come in to play?  Is it whenever
the runtime evaluates a select= XPath expression, i.e. on p:option,
p:parameter, p:iteration-source, p:input and (maybe) p:for-each (call
this the *host*)?  How far does the context R2' establishes persist?
Surely not very far -- once we're past the first step in the contained
steps, the 'current document' is likely to be gone, or changed, or
split into pieces.  Seems to me (and Richard, who I've discussed this
with extensively) that in only the most constrained of circumstances
is the R2'-specifed context the right one for the runtime to use.
After the first step, the current node will be wrong.  Even at the
first step, if the _host_ specifies any source other than an
un-filtered pipe from 'current', again the current node will be wrong.
So anyone who wants to actually _use_ position() to access the
iteration position will have to work _really_ fast to do so, and will
inevitably end up writting:

   <p:option name="index" select="position()"/>
   . . .

and using a variable reference to $index subsequently.  So rather than
confuse the heck out of people with a story about inherited XPath
contexts that get wiped away almost immediately, I'm back to thinking
that p:for-each and p:viewport should just be spec'ed to bind p:index
to the iteration index in the environment they give to their
subpipeline, updated on each iteration.

Footnote:  If we go this route, what _does_ position() mean when
evaluated by the runtime?   That is, what context does the runtime
specify when evaluating an XPath?  This turns out to be trickier than
I would like, but I think to keep certain people happy (:-) we can't
do what I would prefer, because it's simple, and just say 'No context
node, position or size', i.e. appeal to context is _always_ an error.
The alternative is approximately this, by cases:

  If the there is no input (happens if no input specified, and no
  default readable port in the environment), or if there is more than
  one document available on the input, then for p:option and
  p:parameter I think it's clear we have an empty context -- any
  appeal to it results in a dynamic error.  For p:input and friends,
  the situation is murkier.  We've recently clarified that it's the
  _output_ of p:input, as it were, that either is or isn't a sequence,
  as per the component signature.  So it would in a weird way make
  sense to 'insulate' a non-sequence-input component by writing:

    <p:input select="position()=1"/>
    . . .

  For this to work would require us to blur the _by the runtime_/_by
  the component_ distinction, and would eliminate the need for
  p:matching-documents. . .  What's going on is, just like components
  which accept sequences, the p:input is seeing each document in turn,
  and could count them. . .
  I don't know what to do here (or for p:for-each, if its 'select'
  attribute survives)

  If there is exactly one input (happens if one nested p:document or a
  p:pipe from a no-sequence port or a p:pipe from a sequence port that
  only produces a single document), it's easy: context node is the
  document, position and size are 1.


- --
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
Version: GnuPG v1.2.6 (GNU/Linux)

Received on Thursday, 7 June 2007 15:09:43 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:53 GMT