- From: Erik Bruchez <ebruchez@orbeon.com>
- Date: Thu, 05 Jan 2006 18:34:43 +0100
- To: public-xml-processing-model-wg@w3.org
Alex Milowski wrote: > > Requirement: > > The pipeline language must allow a user to identity a subtree > of a document by an XPath or XPath subset that produces a > sequence. This sequence should then be able to be fed to > a sub-pipeline or sequence of pipeline steps. > > Use Case: > > Example Problem (Personal Story): > > I wrote this HMM baum-welch trainer and implemented logging > of the training steps in XML. In the end, I had a log file that > was a 200-300MB XML document (or larger). The required next > step was to transform that document into a data file that R or Matlab > could load (a plain text file). Just running XSLT on the whole thing > isn't realistic. All I really needed to do was transform a > particular element that is repeated over-and-over again in this > large XML log file. So, I wanted to scope the XSLT to that > element and produce the text-transformed result on the little > bits of the document. > > Pipeline Solution Example: > > <subtree select="training-scenario"> > <xslt src="scenario2text-xt.xsl"/> > </subtree> > > The 'subtree' step applies the XPath expression 'training-scenario' > in a streaming fashion to the input. The matching info items (i.e. > the 'training-scenario' elements) are produce as a sequence of > little XML document infoset sets where the 'training-scenario' > element is the document element. When the XSLT step runs, the > "adapter" for it caches the streaming of that infoset into a > "DOM" so that XSLT can run on the whole document. Since that document > is tiny, it can process the large data XML document (of arbitrary > size) in constant memory. > > As I understand it, this is very similar to the 'for-each' step > in Orbeon's pipeline language. Yes, this is similar to what you can do with XPL: <p:for-each href="some-infoset-reference" select="/some/xpath/expr"> ... some pipeline steps ... </p:for-each> XPL also provides the ability to use the (dead?) xpointer xpointer() scheme anywhere URLs are used. This allows extracting sub-infosets very conveniently without having to resort to, say, XSLT. -Erik
Received on Thursday, 5 January 2006 17:35:01 UTC