- From: Michael Sokolov <sokolov@ifactory.com>
- Date: Thu, 4 Jun 2009 08:27:05 -0400
- To: "'Norman Walsh'" <ndw@nwalsh.com>, "'XProc Dev'" <xproc-dev@w3.org>
Thanks for the clear and patient explanations. I haven't yet read through the entire spec very carefully, and hadn't yet gotten to the comment about p:split-sequence. -Mike > -----Original Message----- > From: xproc-dev-request@w3.org > [mailto:xproc-dev-request@w3.org] On Behalf Of Norman Walsh > Sent: Thursday, June 04, 2009 6:40 AM > To: XProc Dev > Subject: Re: streaming vs p:iteration-size() > > "Michael Sokolov" <sokolov@ifactory.com> writes: > > It seems as if support for streaming implementations was a major > > consideration in the design of xproc. I wonder if the > requirement to > > support p:iteration-size() in the context of p:for-each and > p:viewport > > isn't at odds with the ability to create a streaming implementation > > though. For example, wouldn't an implementation be > required to count > > all the matches, thus parsing the entire document, before > processing any of them? > > Yes. We've tried to design XProc so that a streaming > implementation is possible, but that doesn't that every > pipeline will stream. The same problem exists with last() in > ordinary XPath predicates. > > We call this out explicitly in the spec in, for example, > p:split-sequence: > > Note > > In principle, this component cannot stream because it must buffer > all of the input sequence in order to find the context size. In > practice, if the test expression does not use the last() function, > the implementation can stream and ignore the context size. > > > I haven't looked through any implementations to see what's going on > > there, but this seems designed in to the spec anyway. Am I > missing something? > > Nope. And FWIW, XML Calabash doesn't attempt to stream. > > > I probably should add that the context for my question is trying to > > understand the best way to write a "chunker" using xproc. This is > > often an early step in our pipelines: we take a very large document > > and break it into many small documents, abandoning document > structure > > that is no longer useful to us in order to gain efficiency > in querying > > and later processing. Of course one would prefer to do this in a > > streaming fashion: typically we would write a SAX handler > in Java. I > > think perhaps p:viewport combined with a secondary output > port is the > > approach, but I'm not sure, and wondering if that can be > (is) done in a memory-efficient way. > > If you need to recombine the processed chunks, then > p:viewport is probably the easiest way. But if you just want > to chunk, and you can express the chunks with an XPath, you > can do it directly on p:input with a select expression. > > Be seeing you, > norm > > -- > Norman Walsh <ndw@nwalsh.com> | Nothing will ever be attempted, if all > http://nwalsh.com/ | possible objections must be first > | overcome.--Dr. Johnson > >
Received on Thursday, 4 June 2009 12:27:43 UTC