Re: Which is more efficient in the pipe, a whole feed or a sequence of entries? from James Fuller on 2009-04-22 (xproc-dev@w3.org from April 2009)

From: James Fuller <james.fuller.2007@gmail.com>
Date: Wed, 22 Apr 2009 08:36:04 +0200
To: Philip Fennell <Philip.Fennell@bbc.co.uk>
Cc: xproc-dev@w3.org
Message-ID: <a0ad8ffe0904212336l13eb266xe4bf9bf71162b22@mail.gmail.com>

On Fri, Feb 27, 2009 at 1:58 PM, Philip Fennell
<Philip.Fennell@bbc.co.uk> wrote:
> I've been building a library of pipelines that add features to, provide
> utilities for or allow testing of an Atom Store and one aspect of the
> work is creating pipelines that load content into the store.
>
> It is quite straight-forward to construct a pipeline that takes content
> from the file system (or zip file), wrap it in an Atom Entry and then
> PUT|POST it to the store. However, I was wondering whether creating a
> sequence of Entry documents would be more, internally, efficient within
> the pipeline processor than a single Feed document. I've found when
> using Saxon for XSLT processing of large documents collections that the
> saxon:discard-document function very useful in keeping memory usage
> under control.
>
> Up to now I've been ensuring whole documents pass between steps for the
> simplicity of debugging; you can just comment-out the following steps
> and you set a well-formed XML document out the end and no complaints
> about not declaring sequence="true" for you input/output ports.
>
> I have also tried passing a sequence of entries between the steps but I
> haven't noticed any obvious differences in memory usage. For the 6000
> documents that I'm sending to the store, the amount of memory used
> steadily climbs during the 'get from zip' phase to about 180Mb and once
> it moves on to creating the requests and submitting them it rises to
> over 300Mb.

interesting application ... any chance of seeing something a bit more
concrete as there maybe a unit test or two I can add to the xproc test
suite
from this. Even if its just steps e.g. p:choose->p:xslt-> and so on...

email me off list if you would like.

> There doesn't appear to be any indication of 'streaming' going-on here
> but should I expect any difference in the way memory is released when
> Calabash deals with whole feed documents or sequences of entries?

Whilst Normans implementation is far ahead of anyone elses (including
my own) I think the focus is on ensuring correctness with respect to
the spec.

hth, Jim Fuller

Received on Wednesday, 22 April 2009 06:36:44 UTC