RE: Memory usage from Toman_Vojtech@emc.com on 2010-05-10 (xproc-dev@w3.org from May 2010)

From: <Toman_Vojtech@emc.com>
Date: Mon, 10 May 2010 05:16:18 -0400
To: <xproc-dev@w3.org>
Message-ID: <997C307BEB90984EBE935699389EC41C01653654@CORPUSMX70C.corp.emc.com>

> That's pretty well as far as I had worked out - whilst the memory
problem
> I've been seeing is *irritating*, it can't really be described as
simply
> wrong (complexly wrong perhaps) because there is nothing simple that
can
> be used to determine if a file loaded using p:load can be discarded.

After the static analysis phase, the XProc processor has a pretty good
picture of what the connections in the pipeline look like and when/where
results of XProc steps are used (if they are used at all). That
knowledge alone can be used for various memory optimizations. For
instance, in a pipeline like this one:

<p:pipeline>
  <step1/>
  <step2/>
</p:pipeline>

the result of step1 can almost certainly be discarded after step2 is has
finished because there is no other step that refers to the result of
step1.

Another thing is scoping of steps. By wrapping a step in, for example, a
p:group, you can very easily restrict the visibility of the results
produced by the steps in the sub-pipeline:

...
<p:group>
  <step1/>
  <step2/>
</p:group>
...

In the above example, the results produced by step1 can be discarded
once p:group has finished because they will be in an inaccessible scope.

And so on. There are many optimizations that XProc processors can do,
but I think the implementers are just entering this stage after having
implemented the standard. For instance, EMC's Calumet (which I am
involved with) does not yet detect when the result of a step is not used
any more, but it does release the documents when they become out of
scope.

Regards,
Vojtech

--
Vojtech Toman
Principal Software Engineer
EMC Corporation
toman_vojtech@emc.com
http://developer.emc.com/xmltech

Received on Monday, 10 May 2010 09:17:10 UTC