- From: James Fuller <james.fuller.2007@gmail.com>
- Date: Wed, 22 Apr 2009 08:36:04 +0200
- To: Philip Fennell <Philip.Fennell@bbc.co.uk>
- Cc: xproc-dev@w3.org
On Fri, Feb 27, 2009 at 1:58 PM, Philip Fennell <Philip.Fennell@bbc.co.uk> wrote: > I've been building a library of pipelines that add features to, provide > utilities for or allow testing of an Atom Store and one aspect of the > work is creating pipelines that load content into the store. > > It is quite straight-forward to construct a pipeline that takes content > from the file system (or zip file), wrap it in an Atom Entry and then > PUT|POST it to the store. However, I was wondering whether creating a > sequence of Entry documents would be more, internally, efficient within > the pipeline processor than a single Feed document. I've found when > using Saxon for XSLT processing of large documents collections that the > saxon:discard-document function very useful in keeping memory usage > under control. > > Up to now I've been ensuring whole documents pass between steps for the > simplicity of debugging; you can just comment-out the following steps > and you set a well-formed XML document out the end and no complaints > about not declaring sequence="true" for you input/output ports. > > I have also tried passing a sequence of entries between the steps but I > haven't noticed any obvious differences in memory usage. For the 6000 > documents that I'm sending to the store, the amount of memory used > steadily climbs during the 'get from zip' phase to about 180Mb and once > it moves on to creating the requests and submitting them it rises to > over 300Mb. interesting application ... any chance of seeing something a bit more concrete as there maybe a unit test or two I can add to the xproc test suite from this. Even if its just steps e.g. p:choose->p:xslt-> and so on... email me off list if you would like. > There doesn't appear to be any indication of 'streaming' going-on here > but should I expect any difference in the way memory is released when > Calabash deals with whole feed documents or sequences of entries? Whilst Normans implementation is far ahead of anyone elses (including my own) I think the focus is on ensuring correctness with respect to the spec. hth, Jim Fuller
Received on Wednesday, 22 April 2009 06:36:44 UTC