- From: Erik Bruchez <ebruchez@orbeon.com>
- Date: Tue, 10 Jan 2006 21:47:59 +0100
- To: public-xml-processing-model-wg@w3.org
Jeni Tennison wrote: > Hi, > > I thought I'd try to summarize the different kinds of iteration that > we've discussed. > > 1. Iteration over a sequence of elements... > > 1.a. ...identified via an XPath expression (selected from one or > more documents) > > 1.b. ...identified via an (XSLT) pattern (all elements matching the > pattern within one or more documents) > > 2. Iteration over a sequence of documents... > > 2.a. ...generated by an upstream process, such as an XQuery whose > result is a sequence of document nodes > > 2.b. ...identified within an XML document, such as an index that > lists the URLs of other XML documents > > 2.c. ...created as new documents with document elements identified > via 1 above > > Some thoughts: > > The distinction between 1 and 2.c is important because the context of > the element is lost when you copy an element into a new document: both > its ancestors and its base URI are different. XSLT expects an "initial > context node", which can be any node kind, not only a document (and > not only an element), and given that XSLT is one of the main kinds of > component we probably want to support that (in the Processing Model if > not the Processing Language). For reference, XPL does 2.c: its <p:for-each> construct [1] selects a sequence of elements in a source document (which may of course be the result of an earlier pipeline step), and then iterates over each of those by first creating a new document for each element selected. There is some logic to this: if components in a pipeline only exchange complete documents (without context information), then it makes sense for each iteration to also produce a complete document. We have found that exchanging complete documents between components (without context information) is a satisfying strategy for our needs, and has the benefit of simplicity. There is however a drawback to passing an entire document with context information in the case of iterations: in the scenario where you want to use the iteration mechanism to split up processing of a large document (think extracting 1000 pieces out of a 200 MB document), the option of passing to each iteration the entire 200 MB document with context information is likely going to be inefficient compared with extracting 1000 small sub-documents and passing them to each iteration. Note that XPath 2.0 uses a more elaborate terminology for context information [2], which includes not only a context node but a context item, position, and size. We have not made up our mind yet regarding the idea that components can produce sequences (of "items", or simply of documents?), but I just wanted to mention that in XPL, a component input or output, if it reads or produce something, must read or produce one and exactly one complete document. A clear benefit I see is simplicity. -Erik [1] http://www.w3.org/Submission/xpl/#d1056e1329 [2] http://www.w3.org/TR/xpath20/#eval_context
Received on Tuesday, 10 January 2006 20:48:40 UTC