- From: Alex Muir <alex.g.muir@gmail.com>
- Date: Fri, 4 Jun 2010 10:24:18 +0000
- To: Toman_Vojtech@emc.com
- Cc: xproc-dev@w3.org
- Message-ID: <AANLkTil-Il0EKCeIsp_vQVMZGxkrUZjAXGS6PueynmEa@mail.gmail.com>
Vojtech,
Thanks, I hadn't yet come across the ability to do recursion in xproc.
I found a recursive example earlier in the xproc dev list that needed some
small updates for version attributes. Thought I would post it again with
some cx:message for anyone else not aware of recursion in xproc to have the
example.
I'll see if I can get something working for processing the 200 file sets.
Regards
<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
http://www.w3.org/ns/xproc-step" xmlns:cx="
http://xmlcalabash.com/ns/extensions" xmlns:xsd="
http://www.w3.org/2001/XMLSchema"
xmlns:mh="http://metaheuristica.com" name="recursion" version="1.0">
<p:input port="source">
<p:inline>
<xml/>
</p:inline>
</p:input>
<p:output port="result" sequence="true"/>
<p:declare-step type="cx:message" version="1.0">
<p:input port="source"/>
<p:output port="result"/>
<p:option name="message" required="true"/>
</p:declare-step>
<p:declare-step name="recursive" type="mh:step" version="1.0">
<p:input port="source"/>
<p:output port="result"/>
<p:option name="level"/>
<cx:message>
<p:with-option name="message" select="$level"/>
</cx:message>
<p:choose>
<p:when test="number($level) = 0">
<p:identity/>
</p:when>
<p:otherwise>
<mh:step>
<p:with-option name="level" select="number($level) -
1"/>
</mh:step>
</p:otherwise>
</p:choose>
</p:declare-step>
<cx:message>
<p:with-option name="message" select="'Begin Recursion'"/>
</cx:message>
<mh:step level="10"/>
<cx:message>
<p:with-option name="message" select="'End Recursion'"/>
</cx:message>
</p:declare-step>
On Fri, Jun 4, 2010 at 9:36 AM, <Toman_Vojtech@emc.com> wrote:
> Alex,
>
>
>
> One way to process the files in 'batches' could be to use a recursive
> pipeline. You can build the list of files incrementally by calling the
> pipeline recursively, each time adding one (or more) file to the input port
> of the pipeline and increasing a value of some 'count' option. When count is
> equal or greater to 200, process all files that you have on the input port,
> and then call the pipeline with an empty set of documents and 'count' set to
> zero.
>
>
>
> The efficiency of this approach may vary per different XProc
> implementations, depending how they do memory management in recursive calls.
>
>
>
> Regards,
>
> Vojtech
>
>
>
> --
>
> Vojtech Toman
>
> Principal Software Engineer
>
> EMC Corporation
>
> toman_vojtech@emc.com
>
> http://developer.emc.com/xmltech
>
>
>
> *From:* xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] *On
> Behalf Of *Alex Muir
> *Sent:* Friday, June 04, 2010 11:28 AM
> *To:* Romain Deltour
> *Cc:* xproc-dev@w3.org
> *Subject:* Re: Can one within a for-each loop wrap, output, sink a set of
> files and continue processing with remaining files?
>
>
>
> Hi Romain,
>
> Your solution looks like a good one and your not missing any points.
>
> Would the solution, to have to read all input files in before processing
> the first set, be poor in terms of memory use?
>
> There is no way to read in the first 200 and process them and read in the
> second 200 and process those and so on?
>
> Thanks
> Alex
>
> On Thu, Jun 3, 2010 at 6:43 PM, Romain Deltour <rdeltour@gmail.com> wrote:
>
> Hi Alex,
>
>
>
> If I'm understanding correctly your intent and your pipeline, you should
> rather use the @group-adjacent attribute of the p:wrap-sequence step to pack
> 200 files at a time.
>
>
>
> Explanation:
>
> In your pipeline, almost everything happens in one big p:for-each that
> iterates over the 1000 files. The p:choose subpipeline is executed only
> every 200 file, and the wrapper's input is a sequence of this unique file
> (modulo 200).
>
> Actually, rather that grouping files by sets of 200, you ignore 199 files
> and wrap only the 200th in an element before processing it.
>
>
>
> What I would do is:
>
>
>
> p:for-each => to iterate through the 1000 files and load the documents
>
> (note the result of this first p:for-each is a sequence of 1000 documents)
>
> p:wrap-seqence[@group-adjacent] => split the sequence of 1000 into 200-sets
>
> p:for-each => another iteration over the 5 packs of 200 files, to process
> each pack at a time
>
>
>
> I hope this helps and I'm not missing your point...
>
>
>
> BR
>
> Romain.
>
>
>
> Le 3 juin 10 à 18:32, Alex Muir a écrit :
>
>
>
> Hi,
>
> I'm trying to read ~10000 files within a for-each loop, wrap a selection
> from each set of 200 files and process them to output 1 html file, sink the
> processed files and continue with the remaining files processing 200 at a
> time.
>
> Is that possible in xproc?
>
> I've got something like the following which I can't get to work. I think
> that wrapper cannot be used within a for-each, is that the case?
>
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
> http://www.w3.org/ns/xproc-step"
> xmlns:cx="http://xmlcalabash.com/ns/extensions"
> name="wrapWithinForEach" version="1.0">
>
> <p:input port="source">
> <p:inline>
> <xml/>
> </p:inline>
> </p:input>
>
> <p:output port="result" sequence="true"/>
>
> <p:declare-step type="cx:message" version="1.0">
> <p:input port="source"/>
> <p:output port="result"/>
> <p:option name="message" required="true"/>
> </p:declare-step>
>
>
> <!-- ***** Starting and Ending File Numbers ***** -->
> <p:variable name="startingFileNumber" select="'1'"/>
> <p:variable name="endingFileNumber" select="'10000'"/>
> <p:variable name="numberPerFile" select="'200'"/>
>
> <!-- source and output folder variables -->
> <p:variable name="source-folder" select="'completed/XML/'"/>
> <p:variable name="output-folder" select="'MDNA/'"/>
> <p:variable name="error-folder" select="'MDNA/error/'"/>
> <p:variable name="exception-folder" select="'MDNA/exception/'"/>
>
>
> <p:directory-list>
> <p:with-option name="path" select="$source-folder">
> <p:empty/>
> </p:with-option>
> </p:directory-list>
>
>
> <p:for-each name="MDNA">
>
>
> <p:iteration-source
> select="//c:file[position() ge number($startingFileNumber) and
> position() le number($endingFileNumber)]"/>
>
> <p:variable name="fileName" select="c:file/@name"/>
> <p:variable name="startingIterationPosition"
> select="number(p:iteration-position()) +
> number($startingFileNumber)-1"/>
>
> <cx:message>
> <p:with-option name="message"
> select="concat('-----------------------------',
> 'Iteration-position:',' ', $startingIterationPosition, ' File: ',
> $fileName,'-----------------------------')"
> />
> </cx:message>
>
> <p:load>
> <p:with-option name="href"
> select="concat($source-folder,$fileName)"/>
> </p:load>
>
> <cx:message>
> <p:with-option name="message" select="'######
> ExtractContent'"/>
> </cx:message>
> <p:xslt name="ExtractContent">
> <p:input port="source"/>
> <p:input port="stylesheet">
> <p:document href="ExtractContent.xsl"/>
> </p:input>
> <p:input port="parameters">
> <p:empty/>
> </p:input>
> </p:xslt>
>
> <p:identity name="wrap"/>
>
>
> <p:choose>
> <p:when test="position() mod $numberPerFile eq 0">
> <p:wrap-sequence wrapper="WRAP" name="wrapper">
> <p:input port="source">
> <p:pipe port="result" step="wrap"/>
> </p:input>
> </p:wrap-sequence>
>
>
> <p:xslt name="CreateHTML">
> <p:input port="source"/>
> <p:input port="stylesheet">
> <p:document href="CreateHTML.xsl"/>
> </p:input>
> <p:input port="parameters">
> <p:empty/>
> </p:input>
> </p:xslt>
>
>
> <p:identity name="out_file"/>
>
> <p:store name="OUT">
> <p:with-option name="href"
> select="concat($output-folder,
> 'MDNASections','-',$startingFileNumber,'-' ,$endingFileNumber,'.html')">
> <p:pipe step="out_file" port="result"/>
> </p:with-option>
> </p:store>
>
> <p:sink name="sinkIt"/>
>
> </p:when>
> </p:choose>
>
> </p:for-each>
>
>
> </p:declare-step>
>
>
>
>
> Regards
>
>
> --
> Alex
>
> An informal recording with one mic under a tree leads to some pretty sweet
> acoustic sounds.
> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
>
>
>
>
>
>
> --
> Alex
>
> An informal recording with one mic under a tree leads to some pretty sweet
> acoustic sounds.
> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
>
--
Alex
An informal recording with one mic under a tree leads to some pretty sweet
acoustic sounds.
https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Friday, 4 June 2010 10:24:55 UTC