W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > June 2007

Re: The semantics of position() -- trying to be very explicit

From: Innovimax SARL <innovimax@gmail.com>
Date: Thu, 7 Jun 2007 00:17:43 +0200
Message-ID: <546c6c1c0706061517v1d246c15j5dd9ddeea516a1b@mail.gmail.com>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Cc: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>

Two questions :
1) what about empty sequences ?
2) Are we clear that "position()" cannot be evaluated in a @match ?
and cannot be evaluated in a @select of a for-each (since this one
should evaluate a node set), nor in a @select of a p:input. So
position() could only be used in
 * a p:when/@test
 * a p:option/@select as a constant (position() is evaluated and then
concatenated for example : few useful use cases)
 * a p:option/@value for something waiting for a boolean or a number
or a string after having evaluated it as XPath
   + option(test) in matching document (p:subsequence)
   + a bunch of options evaluated to number or string (few useful use cases)

As a consequence, are we clear that using position() instead of
p:position(), gives us only two places where to use this construct in
a potentially useful manner ?

Mohamed

On 6/6/07, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I want to try to work this through in _excruciating_ detail, because
> I'm not (yet) convinced we have a coherent proposition.
>
> First, some terminology:
>
>   An XPath expression is evaluated either *by the runtime*, that is,
>   the pipeline enging itself has to do the evaluation as part of its
>   overall work, or *by a component*, that is, a component
>   implementation itself knows that some string is an XPath expression
>   and needs to get it evaluated.
>
>     In either case, of course, the real evaluation work may well be
>     done by a library, indeed the same library -- what really matters
>     is who decides to do the evaluation, when they do it, and what
>     they specify for the context.
>
>   A component *binds the position for a port* by doing whatever is
>   necessary to determine what the value of position() will be when an
>   XPath expression is evaluate _by that component_ with respect to
>   that port.
>
> Second, a stipulation:
>
>   We are going to use position() to signal where we are in a sequence
>   of documents.  We would like this to be true equally when a
>   component is processing a sequence of inputs on some port, and when
>   a component is iterating over a sequence (today, that means
>   p:for-each and p:viewport).
>
>   Specifying exactly _where_ position() means _what_ is the goal of
>   this message.
>
>   More terminology: call the first use/meaning the *sequence* position
>   and the second use/meaning the *iteration* position.
>
> Third, an observation:
>
>   Only a component which itself accepts sequences as input can
>   possibly ever _bind the sequence position_, because only it can
>   know how many of its input documents it has read.
>
>   For atomic components, this means that only XPath expressions
>   evaluated _by that component_ can access the _sequence_ position.
>
> So far, so good, I think.  We can now state carefully one requirement
> on components:
>
>   R1) Components which evaluate XPath expressions MUST
>       a) For each XPath they evaluate, identify the input port with
>          respect to which they evaluate it;
>       b) _Bind the position for all ports_ to 1 before evaluating any
>          XPath expressions, and _increment the position of a port_
>          after finishing the processing of each input document on that
>          port.
>       [Note that for ports which don't accept sequences, this means
>        the position will always be _bound to_ 1.]
>
> But what about iteration?  I won't bore you unless you press me, with
> the reasoning that gets me here, but this is the only way forward I've
> found which I think works cleanly.
>
> The crucial point is to observe (or be willing to stipulate) that the
> _sequence_ position for an iterator is the _iteration_ position for
> its contained components.  Accordingly we can get what we need as
> follows:
>
>   R2) Compound components which iterate one or more subpipelines with
>       respect to some document sequence MUST arrange to
>       a) _bind the position_ (for the runtime, see below) to 1 before
>           the first iteration of any subpipeline;
>       b) _increment the position_ (for the runtime) before each
>          subsequent iteration of any subpipeline.
>
>       By 'for the runtime' is meant that the position binding is the
>       one which will be used when an XPath expression is evaluated _by
>       the runtime_ during the execution of the relevant subpipeline(s).
>
> Finally, a necessary observation:
>
>    Options given a value with 'select=' have the specified XPath
>    evaluated *by the engine*.
>
>    Options known to a component to be XPaths (typically, but not
>    necessarily, given a value with 'value=') have the specified XPath
>    evaluated *by the component*.
>
> Superficial consequence:
>
>   a) position() in <p:option ... select='...position()...'/>
>      gives _iteration_ position;
>   b) position() in <p:option ... value='...position()...'/> gives
>      _sequence_ position (if it's treated as an XPath at all).
>
>   (in either case, when no sequence/iteration is relevant, position()
>    gives 1.  For _iteration_ position, this requires the top-level
>    pipeline to _bind the position_ (for the runtime) to 1.)
>
> I find it easiest to think of this in terms of position having a
> sort-of special slot in the environment.  p:for-each and p:viewport
> initialise and increment that value for each run of their
> subpipelines, so select= options in those subpipelines can access it
> == the iteration number with position().  Sequence-consuming
> components internally initialise and increment a binding for that
> value local to themselves, so e.g. value= options which they know to
> be XPaths and evaluate can access it == the sequence number with
> position().
>
> Phew!  This works, but will _anyone_ understand it?  Can someone
> explain it in simpler terms, supposing you agree it's right in
> principle?
>
> Examples to follow (sorry, this has taken _far_ too long and I have to
> go cook!).
>
> ht
> - --
>  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
>                      Half-time member of W3C Team
>     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
>             Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
>                    URL: http://www.ltg.ed.ac.uk/~ht/
> [mail really from me _always_ has this .sig -- mail without it is forged spam]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.6 (GNU/Linux)
>
> iD8DBQFGZwXFkjnJixAXWBoRArGHAJkBhZv1L3FQ377SASmeEmnctiPzhQCdFkds
> OSbUmE7FnWBglhyOu4xPrsQ=
> =OtIG
> -----END PGP SIGNATURE-----
>
>


-- 
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 8 72 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr
RCS Paris 488.018.631
SARL au capital de 10.000 
Received on Wednesday, 6 June 2007 22:17:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:53 GMT