Re: Kinds of iteration from Erik Bruchez on 2006-01-10 (public-xml-processing-model-wg@w3.org from January 2006)

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Tue, 10 Jan 2006 21:47:59 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <43C41D7F.40605@orbeon.com>
Jeni Tennison wrote:
 > Hi,
 >
 > I thought I'd try to summarize the different kinds of iteration that
 > we've discussed.
 >
 > 1. Iteration over a sequence of elements...
 >
 >    1.a. ...identified via an XPath expression (selected from one or
 >            more documents)
 >
 >    1.b. ...identified via an (XSLT) pattern (all elements matching the
 >            pattern within one or more documents)
 >
 > 2. Iteration over a sequence of documents...
 >
 >    2.a. ...generated by an upstream process, such as an XQuery whose
 >            result is a sequence of document nodes
 >
 >    2.b. ...identified within an XML document, such as an index that
 >            lists the URLs of other XML documents
 >
 >    2.c. ...created as new documents with document elements identified
 >            via 1 above
 >
 > Some thoughts:
 >
 > The distinction between 1 and 2.c is important because the context of
 > the element is lost when you copy an element into a new document: both
 > its ancestors and its base URI are different. XSLT expects an "initial
 > context node", which can be any node kind, not only a document (and
 > not only an element), and given that XSLT is one of the main kinds of
 > component we probably want to support that (in the Processing Model if
 > not the Processing Language).

For reference, XPL does 2.c: its <p:for-each> construct [1] selects a
sequence of elements in a source document (which may of course be the
result of an earlier pipeline step), and then iterates over each of
those by first creating a new document for each element selected.

There is some logic to this: if components in a pipeline only exchange
complete documents (without context information), then it makes sense
for each iteration to also produce a complete document.

We have found that exchanging complete documents between components
(without context information) is a satisfying strategy for our needs,
and has the benefit of simplicity.

There is however a drawback to passing an entire document with context
information in the case of iterations: in the scenario where you want
to use the iteration mechanism to split up processing of a large
document (think extracting 1000 pieces out of a 200 MB document), the
option of passing to each iteration the entire 200 MB document with
context information is likely going to be inefficient compared with
extracting 1000 small sub-documents and passing them to each
iteration.

Note that XPath 2.0 uses a more elaborate terminology for context
information [2], which includes not only a context node but a context
item, position, and size.

We have not made up our mind yet regarding the idea that components
can produce sequences (of "items", or simply of documents?), but I
just wanted to mention that in XPL, a component input or output, if it
reads or produce something, must read or produce one and exactly one
complete document. A clear benefit I see is simplicity.

-Erik

[1] http://www.w3.org/Submission/xpl/#d1056e1329
[2] http://www.w3.org/TR/xpath20/#eval_context
Received on Tuesday, 10 January 2006 20:48:40 UTC