W3C home > Mailing lists > Public > xproc-dev@w3.org > May 2010

Re: Memory usage

From: Alex Muir <alex.g.muir@gmail.com>
Date: Fri, 7 May 2010 10:57:19 +0000
Message-ID: <s2r88b533b91005070357lbd394d2el6e627c99315cb7ab@mail.gmail.com>
To: Toman_Vojtech@emc.com
Cc: xproc-dev@w3.org
Hi,

A known and accepted issue.
http://code.google.com/p/xmlcalabash/issues/detail?id=94

Reading in multiple xml and writing out multiple xml will cause the memory
leak based on the tests attached to that issue.

Regards
Alex

On Fri, May 7, 2010 at 10:44 AM, <Toman_Vojtech@emc.com> wrote:

> Perhaps Calabash is not releasing already processed documents when they
> become out of scope? In the case of two nested for-each loops, results
> of the inner iteration can often be discarded before the next iteration.
> Perhaps this is not happening.
>
> But the precise behavior depends on what exactly does the outer for-each
> loop do with the results of the inner for-each loop. If it collects all
> of the documents produced by the inner loop, you end up with a quadratic
> number (number of directories times the number files in each directory)
> of documents that will have to be represented in memory.
>
> But I don't know Calabash internals, so I am just guessing here. One
> option for you could be to split the single for-loop into multiple
> loops, or to try to reduce the number of documents that you collect in
> the loops. In some cases explicitly p:sink-ing the outputs might help.
> Another thing that might sometimes help is to wrap some steps in a
> wrapper (for instance, p:group) to make sure that the results of the
> steps are in an explicit scope and therefore don't "leak" outside of the
> scope - this technique can be used as a hint that XProc processors could
> use to release data that is no longer needed from memory. But I am not
> sure if Calabash does this.
>
> Regards,
> Vojtech
>
> > -----Original Message-----
> > From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On
> Behalf Of
> > Nic Gibson
> > Sent: Friday, May 07, 2010 12:22 PM
> > To: XProc Dev
> > Subject: Memory usage
> >
> > We're seeing an XProc script through Calabash that shows increasing
> memory
> > usage over time. I suspect that this is to be expected under the
> circumstances
> > but I wanted to check and see if anyone can suggest a mitigating
> action.
> >
> > The script takes and XML file containing (basically) a list of file
> > URLs. Each of these URLs is a directory on the local filesystem. All
> XML
> > files in each directory are read using p:load then transformed using
> > several XSLT pipelines. The whole script is basically two big nested
> > p:for-each loops (one to read directories and a nested one to read
> > and process the files found)
> >
> > As this runs the memory usage goes up for each file loaded and,
> eventually,
> > the jvm kills the process with a heap exhaustion error.
> >
> > I suspect that there is nothing in the script above that might
> indicate
> > to calabash that any file can be discarded so each one is held in
> memory until
> > the end of the script. Is that likely? I'm not exactly a skilled Java
> > programmer so I'm not in a position to read the code.
> >
> > Can anyone see any sensible approach that might allow us to run this
> > script over several thousand XML file when it currently dies after
> around
> > nine?
> >
> > cheers
> >
> > nic
> >
>
>
>


-- 
Alex

An informal recording with one mic under a tree leads to some pretty sweet
acoustic sounds.
https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Friday, 7 May 2010 10:57:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 7 May 2010 10:57:51 GMT