- From: Alex Muir <alex.g.muir@gmail.com>
- Date: Fri, 4 Jun 2010 10:24:18 +0000
- To: Toman_Vojtech@emc.com
- Cc: xproc-dev@w3.org
- Message-ID: <AANLkTil-Il0EKCeIsp_vQVMZGxkrUZjAXGS6PueynmEa@mail.gmail.com>
Vojtech, Thanks, I hadn't yet come across the ability to do recursion in xproc. I found a recursive example earlier in the xproc dev list that needed some small updates for version attributes. Thought I would post it again with some cx:message for anyone else not aware of recursion in xproc to have the example. I'll see if I can get something working for processing the 200 file sets. Regards <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c=" http://www.w3.org/ns/xproc-step" xmlns:cx=" http://xmlcalabash.com/ns/extensions" xmlns:xsd=" http://www.w3.org/2001/XMLSchema" xmlns:mh="http://metaheuristica.com" name="recursion" version="1.0"> <p:input port="source"> <p:inline> <xml/> </p:inline> </p:input> <p:output port="result" sequence="true"/> <p:declare-step type="cx:message" version="1.0"> <p:input port="source"/> <p:output port="result"/> <p:option name="message" required="true"/> </p:declare-step> <p:declare-step name="recursive" type="mh:step" version="1.0"> <p:input port="source"/> <p:output port="result"/> <p:option name="level"/> <cx:message> <p:with-option name="message" select="$level"/> </cx:message> <p:choose> <p:when test="number($level) = 0"> <p:identity/> </p:when> <p:otherwise> <mh:step> <p:with-option name="level" select="number($level) - 1"/> </mh:step> </p:otherwise> </p:choose> </p:declare-step> <cx:message> <p:with-option name="message" select="'Begin Recursion'"/> </cx:message> <mh:step level="10"/> <cx:message> <p:with-option name="message" select="'End Recursion'"/> </cx:message> </p:declare-step> On Fri, Jun 4, 2010 at 9:36 AM, <Toman_Vojtech@emc.com> wrote: > Alex, > > > > One way to process the files in 'batches' could be to use a recursive > pipeline. You can build the list of files incrementally by calling the > pipeline recursively, each time adding one (or more) file to the input port > of the pipeline and increasing a value of some 'count' option. When count is > equal or greater to 200, process all files that you have on the input port, > and then call the pipeline with an empty set of documents and 'count' set to > zero. > > > > The efficiency of this approach may vary per different XProc > implementations, depending how they do memory management in recursive calls. > > > > Regards, > > Vojtech > > > > -- > > Vojtech Toman > > Principal Software Engineer > > EMC Corporation > > toman_vojtech@emc.com > > http://developer.emc.com/xmltech > > > > *From:* xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] *On > Behalf Of *Alex Muir > *Sent:* Friday, June 04, 2010 11:28 AM > *To:* Romain Deltour > *Cc:* xproc-dev@w3.org > *Subject:* Re: Can one within a for-each loop wrap, output, sink a set of > files and continue processing with remaining files? > > > > Hi Romain, > > Your solution looks like a good one and your not missing any points. > > Would the solution, to have to read all input files in before processing > the first set, be poor in terms of memory use? > > There is no way to read in the first 200 and process them and read in the > second 200 and process those and so on? > > Thanks > Alex > > On Thu, Jun 3, 2010 at 6:43 PM, Romain Deltour <rdeltour@gmail.com> wrote: > > Hi Alex, > > > > If I'm understanding correctly your intent and your pipeline, you should > rather use the @group-adjacent attribute of the p:wrap-sequence step to pack > 200 files at a time. > > > > Explanation: > > In your pipeline, almost everything happens in one big p:for-each that > iterates over the 1000 files. The p:choose subpipeline is executed only > every 200 file, and the wrapper's input is a sequence of this unique file > (modulo 200). > > Actually, rather that grouping files by sets of 200, you ignore 199 files > and wrap only the 200th in an element before processing it. > > > > What I would do is: > > > > p:for-each => to iterate through the 1000 files and load the documents > > (note the result of this first p:for-each is a sequence of 1000 documents) > > p:wrap-seqence[@group-adjacent] => split the sequence of 1000 into 200-sets > > p:for-each => another iteration over the 5 packs of 200 files, to process > each pack at a time > > > > I hope this helps and I'm not missing your point... > > > > BR > > Romain. > > > > Le 3 juin 10 à 18:32, Alex Muir a écrit : > > > > Hi, > > I'm trying to read ~10000 files within a for-each loop, wrap a selection > from each set of 200 files and process them to output 1 html file, sink the > processed files and continue with the remaining files processing 200 at a > time. > > Is that possible in xproc? > > I've got something like the following which I can't get to work. I think > that wrapper cannot be used within a for-each, is that the case? > > <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c=" > http://www.w3.org/ns/xproc-step" > xmlns:cx="http://xmlcalabash.com/ns/extensions" > name="wrapWithinForEach" version="1.0"> > > <p:input port="source"> > <p:inline> > <xml/> > </p:inline> > </p:input> > > <p:output port="result" sequence="true"/> > > <p:declare-step type="cx:message" version="1.0"> > <p:input port="source"/> > <p:output port="result"/> > <p:option name="message" required="true"/> > </p:declare-step> > > > <!-- ***** Starting and Ending File Numbers ***** --> > <p:variable name="startingFileNumber" select="'1'"/> > <p:variable name="endingFileNumber" select="'10000'"/> > <p:variable name="numberPerFile" select="'200'"/> > > <!-- source and output folder variables --> > <p:variable name="source-folder" select="'completed/XML/'"/> > <p:variable name="output-folder" select="'MDNA/'"/> > <p:variable name="error-folder" select="'MDNA/error/'"/> > <p:variable name="exception-folder" select="'MDNA/exception/'"/> > > > <p:directory-list> > <p:with-option name="path" select="$source-folder"> > <p:empty/> > </p:with-option> > </p:directory-list> > > > <p:for-each name="MDNA"> > > > <p:iteration-source > select="//c:file[position() ge number($startingFileNumber) and > position() le number($endingFileNumber)]"/> > > <p:variable name="fileName" select="c:file/@name"/> > <p:variable name="startingIterationPosition" > select="number(p:iteration-position()) + > number($startingFileNumber)-1"/> > > <cx:message> > <p:with-option name="message" > select="concat('-----------------------------', > 'Iteration-position:',' ', $startingIterationPosition, ' File: ', > $fileName,'-----------------------------')" > /> > </cx:message> > > <p:load> > <p:with-option name="href" > select="concat($source-folder,$fileName)"/> > </p:load> > > <cx:message> > <p:with-option name="message" select="'###### > ExtractContent'"/> > </cx:message> > <p:xslt name="ExtractContent"> > <p:input port="source"/> > <p:input port="stylesheet"> > <p:document href="ExtractContent.xsl"/> > </p:input> > <p:input port="parameters"> > <p:empty/> > </p:input> > </p:xslt> > > <p:identity name="wrap"/> > > > <p:choose> > <p:when test="position() mod $numberPerFile eq 0"> > <p:wrap-sequence wrapper="WRAP" name="wrapper"> > <p:input port="source"> > <p:pipe port="result" step="wrap"/> > </p:input> > </p:wrap-sequence> > > > <p:xslt name="CreateHTML"> > <p:input port="source"/> > <p:input port="stylesheet"> > <p:document href="CreateHTML.xsl"/> > </p:input> > <p:input port="parameters"> > <p:empty/> > </p:input> > </p:xslt> > > > <p:identity name="out_file"/> > > <p:store name="OUT"> > <p:with-option name="href" > select="concat($output-folder, > 'MDNASections','-',$startingFileNumber,'-' ,$endingFileNumber,'.html')"> > <p:pipe step="out_file" port="result"/> > </p:with-option> > </p:store> > > <p:sink name="sinkIt"/> > > </p:when> > </p:choose> > > </p:for-each> > > > </p:declare-step> > > > > > Regards > > > -- > Alex > > An informal recording with one mic under a tree leads to some pretty sweet > acoustic sounds. > https://sites.google.com/site/greigconteh/albums/diabarte-and-sons > > > > > > > -- > Alex > > An informal recording with one mic under a tree leads to some pretty sweet > acoustic sounds. > https://sites.google.com/site/greigconteh/albums/diabarte-and-sons > -- Alex An informal recording with one mic under a tree leads to some pretty sweet acoustic sounds. https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Friday, 4 June 2010 10:24:55 UTC