W3C home > Mailing lists > Public > xproc-dev@w3.org > June 2010

Re: Can one within a for-each loop wrap, output, sink a set of files and continue processing with remaining files?

From: Alex Muir <alex.g.muir@gmail.com>
Date: Fri, 4 Jun 2010 10:24:18 +0000
Message-ID: <AANLkTil-Il0EKCeIsp_vQVMZGxkrUZjAXGS6PueynmEa@mail.gmail.com>
To: Toman_Vojtech@emc.com
Cc: xproc-dev@w3.org
Vojtech,

Thanks, I hadn't yet come across the ability to do recursion in xproc.

I found a recursive example earlier in the xproc dev list that needed some
small updates for version attributes. Thought I would post it again with
some cx:message for anyone else not aware of recursion in xproc to have the
example.

I'll see if I can get something working for processing the 200 file sets.

Regards

<p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
http://www.w3.org/ns/xproc-step" xmlns:cx="
http://xmlcalabash.com/ns/extensions" xmlns:xsd="
http://www.w3.org/2001/XMLSchema"
    xmlns:mh="http://metaheuristica.com" name="recursion" version="1.0">
    <p:input port="source">
        <p:inline>
            <xml/>
        </p:inline>
    </p:input>
    <p:output port="result" sequence="true"/>

    <p:declare-step type="cx:message" version="1.0">
        <p:input port="source"/>
        <p:output port="result"/>
        <p:option name="message" required="true"/>
    </p:declare-step>

    <p:declare-step name="recursive" type="mh:step" version="1.0">
        <p:input port="source"/>
        <p:output port="result"/>
        <p:option name="level"/>
        <cx:message>
            <p:with-option name="message" select="$level"/>
        </cx:message>
        <p:choose>
            <p:when test="number($level) = 0">
                <p:identity/>
            </p:when>
            <p:otherwise>
                <mh:step>
                    <p:with-option name="level" select="number($level) -
1"/>
                </mh:step>
            </p:otherwise>
        </p:choose>
    </p:declare-step>

    <cx:message>
        <p:with-option name="message" select="'Begin Recursion'"/>
    </cx:message>
    <mh:step level="10"/>
    <cx:message>
        <p:with-option name="message" select="'End Recursion'"/>
    </cx:message>

</p:declare-step>


On Fri, Jun 4, 2010 at 9:36 AM, <Toman_Vojtech@emc.com> wrote:

>  Alex,
>
>
>
> One way to process the files in 'batches' could be to use a recursive
> pipeline. You can build the list of files incrementally by calling the
> pipeline recursively, each time adding one (or more) file to the input port
> of the pipeline and increasing a value of some 'count' option. When count is
> equal or greater to 200, process all files that you have on the input port,
> and then call the pipeline with an empty set of documents and 'count' set to
> zero.
>
>
>
> The efficiency of this approach may vary per different XProc
> implementations, depending how they do memory management in recursive calls.
>
>
>
> Regards,
>
> Vojtech
>
>
>
> --
>
> Vojtech Toman
>
> Principal Software Engineer
>
> EMC Corporation
>
> toman_vojtech@emc.com
>
> http://developer.emc.com/xmltech
>
>
>
> *From:* xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] *On
> Behalf Of *Alex Muir
> *Sent:* Friday, June 04, 2010 11:28 AM
> *To:* Romain Deltour
> *Cc:* xproc-dev@w3.org
> *Subject:* Re: Can one within a for-each loop wrap, output, sink a set of
> files and continue processing with remaining files?
>
>
>
> Hi Romain,
>
> Your solution looks like a good one and your not missing any points.
>
> Would the solution, to have to read all input files in before processing
> the first set, be poor in terms of memory use?
>
> There is no way to read in the first 200 and process them and read in the
> second 200 and process those and so on?
>
> Thanks
> Alex
>
> On Thu, Jun 3, 2010 at 6:43 PM, Romain Deltour <rdeltour@gmail.com> wrote:
>
> Hi Alex,
>
>
>
> If I'm understanding correctly your intent and your pipeline, you should
> rather use the @group-adjacent attribute of the p:wrap-sequence step to pack
> 200 files at a time.
>
>
>
> Explanation:
>
> In your pipeline, almost everything happens in one big p:for-each that
> iterates over the 1000 files. The p:choose subpipeline is executed only
> every 200 file, and the wrapper's input is a sequence of this unique file
> (modulo 200).
>
> Actually, rather that grouping files by sets of 200, you ignore 199 files
> and wrap only the 200th in an element before processing it.
>
>
>
> What I would do is:
>
>
>
> p:for-each => to iterate through the 1000 files and load the documents
>
> (note the result of this first p:for-each is a sequence of 1000 documents)
>
> p:wrap-seqence[@group-adjacent] => split the sequence of 1000 into 200-sets
>
> p:for-each => another iteration over the 5 packs of 200 files, to process
> each pack at a time
>
>
>
> I hope this helps and I'm not missing your point...
>
>
>
> BR
>
> Romain.
>
>
>
> Le 3 juin 10 à 18:32, Alex Muir a écrit :
>
>
>
>  Hi,
>
> I'm trying to read ~10000 files within a for-each loop, wrap a selection
> from each set of 200 files and process them to output 1 html file, sink the
> processed files and continue with the remaining files processing 200 at a
> time.
>
> Is that possible in xproc?
>
> I've got something like the following which I can't get to work. I think
> that wrapper cannot be used within a for-each, is that the case?
>
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="
> http://www.w3.org/ns/xproc-step"
>     xmlns:cx="http://xmlcalabash.com/ns/extensions"
> name="wrapWithinForEach" version="1.0">
>
>     <p:input port="source">
>         <p:inline>
>             <xml/>
>         </p:inline>
>     </p:input>
>
>     <p:output port="result" sequence="true"/>
>
>     <p:declare-step type="cx:message" version="1.0">
>         <p:input port="source"/>
>         <p:output port="result"/>
>         <p:option name="message" required="true"/>
>     </p:declare-step>
>
>
>     <!-- ***** Starting and Ending File Numbers ***** -->
>     <p:variable name="startingFileNumber" select="'1'"/>
>     <p:variable name="endingFileNumber" select="'10000'"/>
>     <p:variable name="numberPerFile" select="'200'"/>
>
>     <!-- source and output folder variables -->
>     <p:variable name="source-folder" select="'completed/XML/'"/>
>     <p:variable name="output-folder" select="'MDNA/'"/>
>     <p:variable name="error-folder" select="'MDNA/error/'"/>
>     <p:variable name="exception-folder" select="'MDNA/exception/'"/>
>
>
>     <p:directory-list>
>         <p:with-option name="path" select="$source-folder">
>             <p:empty/>
>         </p:with-option>
>     </p:directory-list>
>
>
>     <p:for-each name="MDNA">
>
>
>         <p:iteration-source
>             select="//c:file[position() ge number($startingFileNumber) and
> position() le number($endingFileNumber)]"/>
>
>         <p:variable name="fileName" select="c:file/@name"/>
>         <p:variable name="startingIterationPosition"
>             select="number(p:iteration-position()) +
> number($startingFileNumber)-1"/>
>
>        <cx:message>
>             <p:with-option name="message"
>                 select="concat('-----------------------------',
> 'Iteration-position:','  ', $startingIterationPosition, '  File: ',
> $fileName,'-----------------------------')"
>             />
>         </cx:message>
>
>         <p:load>
>             <p:with-option name="href"
> select="concat($source-folder,$fileName)"/>
>         </p:load>
>
>         <cx:message>
>             <p:with-option name="message" select="'######
> ExtractContent'"/>
>         </cx:message>
>         <p:xslt name="ExtractContent">
>             <p:input port="source"/>
>             <p:input port="stylesheet">
>                 <p:document href="ExtractContent.xsl"/>
>             </p:input>
>             <p:input port="parameters">
>                 <p:empty/>
>             </p:input>
>         </p:xslt>
>
>         <p:identity name="wrap"/>
>
>
>         <p:choose>
>             <p:when test="position() mod $numberPerFile eq 0">
>                 <p:wrap-sequence wrapper="WRAP" name="wrapper">
>                     <p:input port="source">
>                         <p:pipe port="result" step="wrap"/>
>                     </p:input>
>                 </p:wrap-sequence>
>
>
>                 <p:xslt name="CreateHTML">
>                     <p:input port="source"/>
>                     <p:input port="stylesheet">
>                         <p:document href="CreateHTML.xsl"/>
>                     </p:input>
>                     <p:input port="parameters">
>                         <p:empty/>
>                     </p:input>
>                 </p:xslt>
>
>
>                 <p:identity name="out_file"/>
>
>                 <p:store name="OUT">
>                     <p:with-option name="href"
>                         select="concat($output-folder,
> 'MDNASections','-',$startingFileNumber,'-' ,$endingFileNumber,'.html')">
>                         <p:pipe step="out_file" port="result"/>
>                     </p:with-option>
>                 </p:store>
>
>                 <p:sink name="sinkIt"/>
>
>             </p:when>
>         </p:choose>
>
>     </p:for-each>
>
>
> </p:declare-step>
>
>
>
>
> Regards
>
>
> --
> Alex
>
> An informal recording with one mic under a tree leads to some pretty sweet
> acoustic sounds.
> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
>
>
>
>
>
>
> --
> Alex
>
> An informal recording with one mic under a tree leads to some pretty sweet
> acoustic sounds.
> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
>



-- 
Alex

An informal recording with one mic under a tree leads to some pretty sweet
acoustic sounds.
https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Friday, 4 June 2010 10:24:55 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 4 June 2010 10:24:55 GMT