Re: Requirement: Subtree Processing Requirement & Use Case

Alex Milowski wrote:
> 
> Requirement:
> 
>    The pipeline language must allow a user to identity a subtree
>    of a document by an XPath or XPath subset that produces a
>    sequence.  This sequence should then be able to be fed to
>    a sub-pipeline or sequence of pipeline steps.
> 
> Use Case:
> 
>   Example Problem (Personal Story):
> 
>   I wrote this HMM baum-welch trainer and implemented logging
>   of the training steps in XML.  In the end, I had a log file that
>   was a 200-300MB XML document (or larger).  The required next
>   step was to transform that document into a data file that R or Matlab
>   could load (a plain text file).  Just running XSLT on the whole thing
>   isn't realistic.  All I really needed to do was transform a
>   particular element that is repeated over-and-over again in this
>   large XML log file.  So, I wanted to scope the XSLT to that
>   element and produce the text-transformed result on the little
>   bits of the document.
> 
>   Pipeline Solution Example:
> 
>   <subtree select="training-scenario">
>     <xslt src="scenario2text-xt.xsl"/>
>   </subtree>
> 
>   The 'subtree' step applies the XPath expression 'training-scenario'
>   in a streaming fashion to the input.  The matching info items (i.e.
>   the 'training-scenario' elements) are produce as a sequence of
>   little XML document infoset sets where the 'training-scenario'
>   element is the document element.  When the XSLT step runs, the
>   "adapter" for it caches the streaming of that infoset into a
>   "DOM" so that XSLT can run on the whole document.  Since that document
>   is tiny, it can process the large data XML document (of arbitrary
>   size) in constant memory.
> 
>   As I understand it, this is very similar to the 'for-each' step
>   in Orbeon's pipeline language.

Yes, this is similar to what you can do with XPL:

<p:for-each href="some-infoset-reference" select="/some/xpath/expr">
   ... some pipeline steps ...
</p:for-each>

XPL also provides the ability to use the (dead?) xpointer xpointer() 
scheme anywhere URLs are used. This allows extracting sub-infosets very 
conveniently without having to resort to, say, XSLT.

-Erik

Received on Thursday, 5 January 2006 17:35:01 UTC