W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > June 2007

The semantics of position() -- trying to be very explicit

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 06 Jun 2007 20:06:44 +0100
To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
Message-ID: <f5b4plkc37f.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I want to try to work this through in _excruciating_ detail, because
I'm not (yet) convinced we have a coherent proposition.

First, some terminology:

  An XPath expression is evaluated either *by the runtime*, that is,
  the pipeline enging itself has to do the evaluation as part of its
  overall work, or *by a component*, that is, a component
  implementation itself knows that some string is an XPath expression
  and needs to get it evaluated.

    In either case, of course, the real evaluation work may well be
    done by a library, indeed the same library -- what really matters
    is who decides to do the evaluation, when they do it, and what
    they specify for the context.

  A component *binds the position for a port* by doing whatever is
  necessary to determine what the value of position() will be when an
  XPath expression is evaluate _by that component_ with respect to
  that port.

Second, a stipulation:

  We are going to use position() to signal where we are in a sequence
  of documents.  We would like this to be true equally when a
  component is processing a sequence of inputs on some port, and when
  a component is iterating over a sequence (today, that means
  p:for-each and p:viewport).

  Specifying exactly _where_ position() means _what_ is the goal of
  this message.

  More terminology: call the first use/meaning the *sequence* position
  and the second use/meaning the *iteration* position.

Third, an observation:

  Only a component which itself accepts sequences as input can
  possibly ever _bind the sequence position_, because only it can
  know how many of its input documents it has read.

  For atomic components, this means that only XPath expressions
  evaluated _by that component_ can access the _sequence_ position.

So far, so good, I think.  We can now state carefully one requirement
on components:

  R1) Components which evaluate XPath expressions MUST
      a) For each XPath they evaluate, identify the input port with
         respect to which they evaluate it;
      b) _Bind the position for all ports_ to 1 before evaluating any
         XPath expressions, and _increment the position of a port_
         after finishing the processing of each input document on that
         port.
      [Note that for ports which don't accept sequences, this means
       the position will always be _bound to_ 1.]

But what about iteration?  I won't bore you unless you press me, with
the reasoning that gets me here, but this is the only way forward I've
found which I think works cleanly.

The crucial point is to observe (or be willing to stipulate) that the
_sequence_ position for an iterator is the _iteration_ position for
its contained components.  Accordingly we can get what we need as
follows:

  R2) Compound components which iterate one or more subpipelines with
      respect to some document sequence MUST arrange to
      a) _bind the position_ (for the runtime, see below) to 1 before
          the first iteration of any subpipeline;
      b) _increment the position_ (for the runtime) before each
         subsequent iteration of any subpipeline.

      By 'for the runtime' is meant that the position binding is the
      one which will be used when an XPath expression is evaluated _by
      the runtime_ during the execution of the relevant subpipeline(s).

Finally, a necessary observation:

   Options given a value with 'select=' have the specified XPath
   evaluated *by the engine*.

   Options known to a component to be XPaths (typically, but not
   necessarily, given a value with 'value=') have the specified XPath
   evaluated *by the component*.

Superficial consequence:

  a) position() in <p:option ... select='...position()...'/>
     gives _iteration_ position;
  b) position() in <p:option ... value='...position()...'/> gives
     _sequence_ position (if it's treated as an XPath at all).

  (in either case, when no sequence/iteration is relevant, position()
   gives 1.  For _iteration_ position, this requires the top-level
   pipeline to _bind the position_ (for the runtime) to 1.)

I find it easiest to think of this in terms of position having a
sort-of special slot in the environment.  p:for-each and p:viewport
initialise and increment that value for each run of their
subpipelines, so select= options in those subpipelines can access it
== the iteration number with position().  Sequence-consuming
components internally initialise and increment a binding for that
value local to themselves, so e.g. value= options which they know to
be XPaths and evaluate can access it == the sequence number with
position().

Phew!  This works, but will _anyone_ understand it?  Can someone
explain it in simpler terms, supposing you agree it's right in
principle?

Examples to follow (sorry, this has taken _far_ too long and I have to
go cook!).

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFGZwXFkjnJixAXWBoRArGHAJkBhZv1L3FQ377SASmeEmnctiPzhQCdFkds
OSbUmE7FnWBglhyOu4xPrsQ=
=OtIG
-----END PGP SIGNATURE-----
Received on Wednesday, 6 June 2007 19:06:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:53 GMT