- From: Henry S. Thompson <ht@inf.ed.ac.uk>
- Date: Wed, 06 Jun 2007 20:06:44 +0100
- To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I want to try to work this through in _excruciating_ detail, because
I'm not (yet) convinced we have a coherent proposition.
First, some terminology:
An XPath expression is evaluated either *by the runtime*, that is,
the pipeline enging itself has to do the evaluation as part of its
overall work, or *by a component*, that is, a component
implementation itself knows that some string is an XPath expression
and needs to get it evaluated.
In either case, of course, the real evaluation work may well be
done by a library, indeed the same library -- what really matters
is who decides to do the evaluation, when they do it, and what
they specify for the context.
A component *binds the position for a port* by doing whatever is
necessary to determine what the value of position() will be when an
XPath expression is evaluate _by that component_ with respect to
that port.
Second, a stipulation:
We are going to use position() to signal where we are in a sequence
of documents. We would like this to be true equally when a
component is processing a sequence of inputs on some port, and when
a component is iterating over a sequence (today, that means
p:for-each and p:viewport).
Specifying exactly _where_ position() means _what_ is the goal of
this message.
More terminology: call the first use/meaning the *sequence* position
and the second use/meaning the *iteration* position.
Third, an observation:
Only a component which itself accepts sequences as input can
possibly ever _bind the sequence position_, because only it can
know how many of its input documents it has read.
For atomic components, this means that only XPath expressions
evaluated _by that component_ can access the _sequence_ position.
So far, so good, I think. We can now state carefully one requirement
on components:
R1) Components which evaluate XPath expressions MUST
a) For each XPath they evaluate, identify the input port with
respect to which they evaluate it;
b) _Bind the position for all ports_ to 1 before evaluating any
XPath expressions, and _increment the position of a port_
after finishing the processing of each input document on that
port.
[Note that for ports which don't accept sequences, this means
the position will always be _bound to_ 1.]
But what about iteration? I won't bore you unless you press me, with
the reasoning that gets me here, but this is the only way forward I've
found which I think works cleanly.
The crucial point is to observe (or be willing to stipulate) that the
_sequence_ position for an iterator is the _iteration_ position for
its contained components. Accordingly we can get what we need as
follows:
R2) Compound components which iterate one or more subpipelines with
respect to some document sequence MUST arrange to
a) _bind the position_ (for the runtime, see below) to 1 before
the first iteration of any subpipeline;
b) _increment the position_ (for the runtime) before each
subsequent iteration of any subpipeline.
By 'for the runtime' is meant that the position binding is the
one which will be used when an XPath expression is evaluated _by
the runtime_ during the execution of the relevant subpipeline(s).
Finally, a necessary observation:
Options given a value with 'select=' have the specified XPath
evaluated *by the engine*.
Options known to a component to be XPaths (typically, but not
necessarily, given a value with 'value=') have the specified XPath
evaluated *by the component*.
Superficial consequence:
a) position() in <p:option ... select='...position()...'/>
gives _iteration_ position;
b) position() in <p:option ... value='...position()...'/> gives
_sequence_ position (if it's treated as an XPath at all).
(in either case, when no sequence/iteration is relevant, position()
gives 1. For _iteration_ position, this requires the top-level
pipeline to _bind the position_ (for the runtime) to 1.)
I find it easiest to think of this in terms of position having a
sort-of special slot in the environment. p:for-each and p:viewport
initialise and increment that value for each run of their
subpipelines, so select= options in those subpipelines can access it
== the iteration number with position(). Sequence-consuming
components internally initialise and increment a binding for that
value local to themselves, so e.g. value= options which they know to
be XPaths and evaluate can access it == the sequence number with
position().
Phew! This works, but will _anyone_ understand it? Can someone
explain it in simpler terms, supposing you agree it's right in
principle?
Examples to follow (sorry, this has taken _far_ too long and I have to
go cook!).
ht
- --
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
iD8DBQFGZwXFkjnJixAXWBoRArGHAJkBhZv1L3FQ377SASmeEmnctiPzhQCdFkds
OSbUmE7FnWBglhyOu4xPrsQ=
=OtIG
-----END PGP SIGNATURE-----
Received on Wednesday, 6 June 2007 19:06:53 UTC