W3C home > Mailing lists > Public > xproc-dev@w3.org > February 2009

Which is more efficient in the pipe, a whole feed or a sequence of entries?

From: Philip Fennell <Philip.Fennell@bbc.co.uk>
Date: Fri, 27 Feb 2009 11:58:10 -0000
Message-ID: <FBFAE0E0B37B4148AF77509764D4BC2007B1E7FF@bbcxues11.national.core.bbc.co.uk>
To: <xproc-dev@w3.org>
I've been building a library of pipelines that add features to, provide
utilities for or allow testing of an Atom Store and one aspect of the
work is creating pipelines that load content into the store.

It is quite straight-forward to construct a pipeline that takes content
from the file system (or zip file), wrap it in an Atom Entry and then
PUT|POST it to the store. However, I was wondering whether creating a
sequence of Entry documents would be more, internally, efficient within
the pipeline processor than a single Feed document. I've found when
using Saxon for XSLT processing of large documents collections that the
saxon:discard-document function very useful in keeping memory usage
under control.

Up to now I've been ensuring whole documents pass between steps for the
simplicity of debugging; you can just comment-out the following steps
and you set a well-formed XML document out the end and no complaints
about not declaring sequence="true" for you input/output ports.

I have also tried passing a sequence of entries between the steps but I
haven't noticed any obvious differences in memory usage. For the 6000
documents that I'm sending to the store, the amount of memory used
steadily climbs during the 'get from zip' phase to about 180Mb and once
it moves on to creating the requests and submitting them it rises to
over 300Mb.

There doesn't appear to be any indication of 'streaming' going-on here
but should I expect any difference in the way memory is released when
Calabash deals with whole feed documents or sequences of entries?


Philip Fennell

>XML Developer (The Forge)
>BBC Future Media & Technology
>Media Village, 201 Wood Lane London W12 7TP
>BC4 C4, Broadcast Centre
>T:	0208 0085318

This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Friday, 27 February 2009 11:58:48 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:03:04 UTC