W3C home > Mailing lists > Public > xproc-dev@w3.org > November 2009

Re: Memory issues processing 100's input files through xproc script from oxygen 11 calabash

From: Alex Muir <alex.g.muir@gmail.com>
Date: Fri, 27 Nov 2009 08:30:25 +0000
Message-ID: <88b533b90911270030i2ad92ed2j817af3466a3c3580@mail.gmail.com>
To: XProc Dev <xproc-dev@w3.org>
Hi,

I've been able to reproduce a java.lang.OutOfMemoryError running calabash
with the following 2 xproc files after processing a directory of around 31
xml files. In this version I'm just reading in and copying the the file to
the output. The library function   <p:declare-step type="meta:copy"> is
causing the problem. I note that in my previous example I wasn't using this
copy function so there is likely another issue.


<p:declare-step xmlns:p="http://www.w3.org/ns/xproc"
  xmlns:meta="http://www.yo.com"
  xmlns:c="http://www.w3.org/ns/xproc-step"
  name="BuildSummaryOutput">


  <p:input port="source">
    <p:empty/>
  </p:input>

  <p:output port="result" sequence="true" />

  <p:import href="Library.xpl" />

  <p:variable name="source-folder" select="'../IN/'"/>
  <p:variable name="output-folder" select="'../OUT/'"/>

  <p:directory-list>
    <p:with-option name="path" select="$source-folder">
      <p:empty/>
    </p:with-option>
  </p:directory-list>


  <p:filter select="//c:file"/>
  <p:for-each name="SummaryByFile">
    <p:variable name="filename" select="c:file/@name"/>
    <p:load>
      <p:with-option name="href"
        select="concat($source-folder,$filename)"/>
    </p:load>

    <meta:copy/> <!-- this causes memory issue -->

  <!--  <p:identity name="copy"/> --> <!-- this run fines in replacement of
the copy -->


    <p:documentation> Store XML file Output </p:documentation>
    <p:identity name="out_file"/>
    <p:store name="store">
      <p:with-option name="href" select="concat($output-folder, $filename)">
        <p:pipe step="out_file" port="result"/>
      </p:with-option>
    </p:store>


    <p:documentation> Create result XML which is a list of all files
transformed </p:documentation>
    <p:identity>
      <p:input port="source">
        <p:pipe step="store" port="result"/>
      </p:input>
    </p:identity>

  </p:for-each>

  <p:documentation>Wrap result XML </p:documentation>
  <p:wrap-sequence wrapper="SummaryByFile"/>
  <p:identity/>

</p:declare-step>


And the following library file

<p:library xmlns:p="http://www.w3.org/ns/xproc" xmlns:meta="
http://www.yo.com">

  <p:declare-step type="meta:copy">
    <p:input port="source"/>
    <p:output port="result"/>
    <p:identity/>
  </p:declare-step>

</p:library>

>From cmd window
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at net.sf.saxon.om.FastStringBuffer.<init>(FastStringBuffer.java:32)
        at
net.sf.saxon.tinytree.LargeStringBuffer.<init>(LargeStringBuffer.java:57)
        at net.sf.saxon.tinytree.TinyTree.<init>(TinyTree.java:177)
        at net.sf.saxon.tinytree.TinyTree.<init>(TinyTree.java:145)
        at net.sf.saxon.tinytree.TinyBuilder.open(TinyBuilder.java:96)
        at com.xmlcalabash.util.TreeWriter.startDocument(TreeWriter.java:94)
        at com.xmlcalabash.library.Store.run(Store.java:111)
        at com.xmlcalabash.runtime.XAtomicStep.run(XAtomicStep.java:385)
        at com.xmlcalabash.runtime.XForEach.run(XForEach.java:101)
        at com.xmlcalabash.runtime.XPipeline.doRun(XPipeline.java:234)
        at com.xmlcalabash.runtime.XPipeline.run(XPipeline.java:136)
        at com.xmlcalabash.drivers.Main.run(Main.java:248)
        at com.xmlcalabash.drivers.Main.main(Main.java:67)

On Thu, Nov 26, 2009 at 7:30 PM, Alex Muir <alex.g.muir@gmail.com> wrote:

> Hi,
>
> I'm not really certain how to go about detecting the location of a memory
> problem running xproc in oxygen. I'm getting the memory in oxygen increasing
> to 580,000K when running  the following generalized xproc script with
> hundreds of input files using calabash, after the script fails the memory
> stays at that level. The script processes a few hundred files correctly
> before getting an error. I'm loading the files in the xslt as unparsed text
> and there are a series of XSLT scripts generally using analyze-string to
> identify some content or another and updating the xml content.
>
>
> I submitted an error to oxygen just in case it can be reproduced. Are tests
> being run to process hundreds of input files between say 1000KB to 8000KB
> with an initial xsl loading and converting unparsed text into xml using
> xproc?
>
> I suppose I'm probably doing something wrong but can anyone reproduce a
> similar problem?
>
>
> <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" xmlns:meta="
> http://www.metaheuristica.com"
>   xmlns:c="http://www.w3.org/ns/xproc-step" xmlns:cx="
> http://xmlcalabash.com/ns/extensions"
>   name="Createi4EnrichMarkup">
>
>     Markup</p:documentation>
>
>   <p:import href="Library.xpl"/>
>
>   <p:input port="source">
>     <p:document href="blank.xml"/>
>   </p:input>
>
>
>   <p:output port="result" sequence="true"/>
>
>    <p:variable name="source-folder" select="'file:/C:/2008/'"/>
>   <p:variable name="output-folder" select="'../a/'"/>
>   <p:variable name="outputFolderNoPageBreaks" select="'../a/d/'"/>
>   <p:variable name="outputFolderNoMTOC" select="'../a/x/'"/>
>   <p:directory-list>
>     <p:with-option name="path" select="$source-folder">
>       <p:empty/>
>     </p:with-option>
>   </p:directory-list>
>
>   <p:for-each name="forEachFile">
>
>     <p:iteration-source select="//c:file"/>
>     <!-- <p:output port="result"/>-->
>
>     <p:variable name="fileName" select="c:file/@name"/>
>
>
>        <!-- series of p:xslt call -->
>
>
>
>     <p:choose>
>       <p:when test="/document[@YYY= 'true']">
>         <p:output port="result"/>
>
>
>           <!-- series of p:xslt call -->
>
>
>         <p:choose>
>           <p:when test="/document[@XXX >= 15]">
>             <p:output port="result"/>
>
>
>          <!-- series of p:xslt call -->
>
>
>
>             <p:documentation> Store XML file Output </p:documentation>
>             <p:identity name="out_file"/>
>             <p:store name="store">
>               <p:with-option name="href"
>                 select="replace(replace(concat($output-folder, $fileName,
> '.xml'),'.html',''),' ','')">
>                 <p:pipe step="out_file" port="result"/>
>               </p:with-option>
>             </p:store>
>
>             <p:documentation> Create result XML </p:documentation>
>             <p:identity>
>               <p:input port="source">
>                 <p:pipe step="store" port="result"/>
>               </p:input>
>             </p:identity>
>
>           </p:when>
>
>
>           <p:otherwise>
>
>             <p:output port="result"/>
>             <p:documentation> Store XML file Output </p:documentation>
>             <p:identity name="out_file"/>
>             <p:store name="store">
>               <p:with-option name="href"
>                 select="replace(replace(concat($outputFolderNoMTOC,
> $fileName, '.xml'),'.html',''),' ','')">
>                 <p:pipe step="out_file" port="result"/>
>               </p:with-option>
>             </p:store>
>
>
>             <p:documentation> Create result XML </p:documentation>
>             <p:identity>
>               <p:input port="source">
>                 <p:pipe step="store" port="result"/>
>               </p:input>
>             </p:identity>
>
>           </p:otherwise>
>
>         </p:choose>
>
>       </p:when>
>       <p:otherwise>
>
>         <p:output port="result"/>
>         <p:documentation> Store XML file Output </p:documentation>
>         <p:identity name="out_file"/>
>         <p:store name="store">
>           <p:with-option name="href"
>             select="replace(replace(concat($outputFolderNoPageBreaks,
> $fileName, '.xml'),'.html',''),' ','')">
>             <p:pipe step="out_file" port="result"/>
>           </p:with-option>
>         </p:store>
>
>
>         <p:documentation> Create result XML </p:documentation>
>         <p:identity>
>           <p:input port="source">
>             <p:pipe step="store" port="result"/>
>           </p:input>
>         </p:identity>
>
>       </p:otherwise>
>
>
>     </p:choose>
>
>
>
>
>
>   </p:for-each>
>
>   <p:documentation>Wrap result XML </p:documentation>
>   <p:wrap-sequence wrapper="forEachFile"/>
>   <p:identity/>
> </p:declare-step>
>
>
> Regards
> --
>
> Alex
> https://sites.google.com/a/utg.edu.gm/alex
>
>


-- 

Alex
https://sites.google.com/a/utg.edu.gm/alex
Received on Friday, 27 November 2009 08:31:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 November 2009 08:31:13 GMT