- From: Romain Deltour <rdeltour@gmail.com>
- Date: Fri, 4 Jun 2010 22:24:44 +0200
- To: xproc-dev@w3.org
- Message-Id: <148ADAAE-0F4B-4408-853E-11C0F16437C5@gmail.com>
Hi again, I tried a pure XProc equivalent of your XSLT (a little practice never hurts ;-), and here's the result (it's shorter too): <p:for-each> <p:iteration-source select="//c:file"/> <p:identity/> </p:for-each> <p:wrap-sequence wrapper="c:group" group- adjacent="xs:integer((position()-1) div 2)"/> <p:wrap-sequence wrapper="c:files"/> 1. The first p:for-each split the flat list in a sequence of c:file documents. 2. The first p:wrap-sequence creates a sequence of 2-packs using the group-adjacent feature. 3. The last p:wrap-sequence wraps the sequence of 2-packs in a single document Romain. PS: there seems to be a bug in Calabash, which doesn't allows using variables in the @group-adjacent expression Le 4 juin 10 à 18:25, Alex Muir a écrit : > Hi, > > Well I ended up modifying the p:directory list with a p:xslt given I > didn't know how to do it using xproc and it's easy. > > So this xproc isn't yet processing the files (will get to that now) > but does the first step,, reads the directory list and groups n file > names pre group using the param filePerGroup. See xproc, xslt and > output below. > > <?xml version="1.0" encoding="UTF-8"?> > <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:cx="http://xmlcalabash.com/ns/extensions > " > xmlns:c="http://www.w3.org/ns/xproc-step" name="chunk" > version="1.0"> > > <p:input port="source"> > <p:empty/> > </p:input> > > <p:variable name="source-folder" select="'in/'"/> > <p:variable name="output-folder" select="'out/'"/> > > <p:directory-list> > <p:with-option name="path" select="$source-folder"> > <p:empty/> > </p:with-option> > </p:directory-list> > > <p:xslt version="1.0" name="chunkFiles"> > <p:input port="stylesheet"> > <p:document href="chunkFiles.xsl"/> > </p:input> > <p:with-param name="filePerGroup" select="2"/> > <p:input port="parameters"> > <p:empty/> > </p:input> > </p:xslt> > > > <p:store name="store"> > <p:with-option name="href" select="concat($output- > folder,'directory-list.xml')"/> > </p:store> > > </p:declare-step> > > > XSLT FILE: > > <?xml version="1.0" encoding="UTF-8"?> > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result- > prefixes="#all" > xmlns:c="http://www.w3.org/ns/xproc-step" version="2.0"> > <xsl:output method="xml" indent="yes"/> > <xsl:param name="filePerGroup"/> > <xsl:template match="/c:directory"> > <xsl:variable name="directory" as="element()*"> > > <chunk/> > <xsl:for-each select="c:file"> > > <file> > <xsl:copy-of select="@*"/> > </file> > <xsl:if test="exists(following-sibling::c:file) and > position() mod xs:integer($filePerGroup) eq 0"> > <chunk/> > </xsl:if> > > </xsl:for-each> > > </xsl:variable> > > <!-- > <xsl:copy-of select="$directory"/>--> > > > <files> > <xsl:for-each-group select="$directory" group-starting- > with="chunk"> > <group> > <xsl:for-each select="current-group()"> > <xsl:if test="self::file"> > <file> > <xsl:apply-templates select="file| > @*"/> > </file> > </xsl:if> > </xsl:for-each> > </group> > </xsl:for-each-group> > </files> > > </xsl:template> > </xsl:stylesheet> > > EXAMPLE OUTPUT > > <files> > <group> > <file>ONE.xml</file> > <file>TWO.xml</file> > </group> > <group> > <file>THREE.xml</file> > <file>FOUR.xml</file> > </group> > <group> > <file>FIVE.xml</file> > <file>SIX.xml</file> > </group> > </files> > > > > > On Fri, Jun 4, 2010 at 12:24 PM, Alex Muir <alex.g.muir@gmail.com> > wrote: > Thanks, looks good! > > I admit the recursive solution was giving me pause to implement. > > > On Fri, Jun 4, 2010 at 11:47 AM, Romain Deltour <rdeltour@gmail.com> > wrote: >> Would the solution, to have to read all input files in before >> processing the first set, be poor in terms of memory use? > > > You can improve the pipeline depending on the most resource > intensive step. If you want to reduce the number of XML documents > parsed in memory, an alternative could be to work on the sequence of > file paths returned by the p:directory-list rather than on the > sequence of document. In other words, you would move the resource- > intensive p:load from the first p:for-each to the second: > > p:for-each => to create a sequence of 100 paths from the flat list > returned by p:directory-list > (note the result of this first p:for-each is a sequence of 1000 > documents) > p:wrap-seqence[@group-adjacent] => split the sequence of 1000 into > 200-sets > p:for-each => another iteration over the 5 packs of 200 files, to > process each pack at a time, loading the document then processing it > > Vojtech's idea of using recursion sounds good to. > > Romain. > > Le 4 juin 10 à 11:27, Alex Muir a écrit : > >> Hi Romain, >> >> Your solution looks like a good one and your not missing any points. >> >> Would the solution, to have to read all input files in before >> processing the first set, be poor in terms of memory use? >> >> There is no way to read in the first 200 and process them and read >> in the second 200 and process those and so on? >> >> Thanks >> Alex >> >> On Thu, Jun 3, 2010 at 6:43 PM, Romain Deltour <rdeltour@gmail.com> >> wrote: >> Hi Alex, >> >> If I'm understanding correctly your intent and your pipeline, you >> should rather use the @group-adjacent attribute of the p:wrap- >> sequence step to pack 200 files at a time. >> >> Explanation: >> In your pipeline, almost everything happens in one big p:for-each >> that iterates over the 1000 files. The p:choose subpipeline is >> executed only every 200 file, and the wrapper's input is a sequence >> of this unique file (modulo 200). >> Actually, rather that grouping files by sets of 200, you ignore 199 >> files and wrap only the 200th in an element before processing it. >> >> What I would do is: >> >> p:for-each => to iterate through the 1000 files and load the >> documents >> (note the result of this first p:for-each is a sequence of 1000 >> documents) >> p:wrap-seqence[@group-adjacent] => split the sequence of 1000 into >> 200-sets >> p:for-each => another iteration over the 5 packs of 200 files, to >> process each pack at a time >> >> I hope this helps and I'm not missing your point... >> >> BR >> Romain. >> >> Le 3 juin 10 à 18:32, Alex Muir a écrit : >> >>> Hi, >>> >>> I'm trying to read ~10000 files within a for-each loop, wrap a >>> selection from each set of 200 files and process them to output 1 >>> html file, sink the processed files and continue with the >>> remaining files processing 200 at a time. >>> >>> Is that possible in xproc? >>> >>> I've got something like the following which I can't get to work. I >>> think that wrapper cannot be used within a for-each, is that the >>> case? >>> >>> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:c="http://www.w3.org/ns/xproc-step >>> " >>> xmlns:cx="http://xmlcalabash.com/ns/extensions" >>> name="wrapWithinForEach" version="1.0"> >>> >>> <p:input port="source"> >>> <p:inline> >>> <xml/> >>> </p:inline> >>> </p:input> >>> >>> <p:output port="result" sequence="true"/> >>> >>> <p:declare-step type="cx:message" version="1.0"> >>> <p:input port="source"/> >>> <p:output port="result"/> >>> <p:option name="message" required="true"/> >>> </p:declare-step> >>> >>> >>> <!-- ***** Starting and Ending File Numbers ***** --> >>> <p:variable name="startingFileNumber" select="'1'"/> >>> <p:variable name="endingFileNumber" select="'10000'"/> >>> <p:variable name="numberPerFile" select="'200'"/> >>> >>> <!-- source and output folder variables --> >>> <p:variable name="source-folder" select="'completed/XML/'"/> >>> <p:variable name="output-folder" select="'MDNA/'"/> >>> <p:variable name="error-folder" select="'MDNA/error/'"/> >>> <p:variable name="exception-folder" select="'MDNA/exception/'"/> >>> >>> >>> <p:directory-list> >>> <p:with-option name="path" select="$source-folder"> >>> <p:empty/> >>> </p:with-option> >>> </p:directory-list> >>> >>> >>> <p:for-each name="MDNA"> >>> >>> >>> <p:iteration-source >>> select="//c:file[position() ge >>> number($startingFileNumber) and position() le >>> number($endingFileNumber)]"/> >>> >>> <p:variable name="fileName" select="c:file/@name"/> >>> <p:variable name="startingIterationPosition" >>> select="number(p:iteration-position()) + >>> number($startingFileNumber)-1"/> >>> >>> <cx:message> >>> <p:with-option name="message" >>> select="concat('-----------------------------', >>> 'Iteration-position:',' ', $startingIterationPosition, ' File: >>> ', $fileName,'-----------------------------')" >>> /> >>> </cx:message> >>> >>> <p:load> >>> <p:with-option name="href" select="concat($source- >>> folder,$fileName)"/> >>> </p:load> >>> >>> <cx:message> >>> <p:with-option name="message" select="'###### >>> ExtractContent'"/> >>> </cx:message> >>> <p:xslt name="ExtractContent"> >>> <p:input port="source"/> >>> <p:input port="stylesheet"> >>> <p:document href="ExtractContent.xsl"/> >>> </p:input> >>> <p:input port="parameters"> >>> <p:empty/> >>> </p:input> >>> </p:xslt> >>> >>> <p:identity name="wrap"/> >>> >>> >>> <p:choose> >>> <p:when test="position() mod $numberPerFile eq 0"> >>> <p:wrap-sequence wrapper="WRAP" name="wrapper"> >>> <p:input port="source"> >>> <p:pipe port="result" step="wrap"/> >>> </p:input> >>> </p:wrap-sequence> >>> >>> >>> <p:xslt name="CreateHTML"> >>> <p:input port="source"/> >>> <p:input port="stylesheet"> >>> <p:document href="CreateHTML.xsl"/> >>> </p:input> >>> <p:input port="parameters"> >>> <p:empty/> >>> </p:input> >>> </p:xslt> >>> >>> >>> <p:identity name="out_file"/> >>> >>> <p:store name="OUT"> >>> <p:with-option name="href" >>> select="concat($output-folder, >>> 'MDNASections','-',$startingFileNumber,'-' , >>> $endingFileNumber,'.html')"> >>> <p:pipe step="out_file" port="result"/> >>> </p:with-option> >>> </p:store> >>> >>> <p:sink name="sinkIt"/> >>> >>> </p:when> >>> </p:choose> >>> >>> </p:for-each> >>> >>> >>> </p:declare-step> >>> >>> >>> >>> >>> Regards >>> >>> >>> -- >>> Alex >>> >>> An informal recording with one mic under a tree leads to some >>> pretty sweet acoustic sounds. >>> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons >> >> >> >> >> -- >> Alex >> >> An informal recording with one mic under a tree leads to some >> pretty sweet acoustic sounds. >> https://sites.google.com/site/greigconteh/albums/diabarte-and-sons > > > > > -- > Alex > > An informal recording with one mic under a tree leads to some pretty > sweet acoustic sounds. > https://sites.google.com/site/greigconteh/albums/diabarte-and-sons > > > > -- > Alex > > An informal recording with one mic under a tree leads to some pretty > sweet acoustic sounds. > https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Friday, 4 June 2010 20:25:23 UTC