Re: Memory usage from Alex Muir on 2010-05-11 (xproc-dev@w3.org from May 2010)

From: Alex Muir <alex.g.muir@gmail.com>
Date: Tue, 11 May 2010 09:31:28 +0000
To: Toman_Vojtech@emc.com
Cc: xproc-dev@w3.org
Message-ID: <AANLkTikT15JAlG7JqBR8p4XJAmrpre_c1T03nAB3r017@mail.gmail.com>
This is really interesting and clearly the frame of reference that
developers have not yet had time to optimize important.

With the case of a for loop that reads input files, processes and
stores output for each file and creates a result set of the files,
would an implementation be able to easily determine that memory is not
required in all cases given for example that a developer could be
storing data and/or merging it?

Would it be easy until such time that optimizations are completed to
create a <reclaim memory> step of some kind which perhaps reclaims the
memory of a given port and sinks? <p:sinkMem>

<p:declare-step>
  <p:input port="source">
  <p:output port="result" sequence="true"/>

  <p:directory-list>

  <p:for-each name="forEachFile">

          <p:xslt version="1.0" name="LoadData">

          <p:xslt version="1.0" name="Processing">

           <p:store name="store">

           <p:identity>
              <p:input port="source">
                <p:pipe step="store" port="result"/>
              </p:input>
           </p:identity>

           <p:sinkMem>

  </p:for-each>
  <p:documentation>Wrap result XML </p:documentation>
  <p:wrap-sequence wrapper="forEachFile"/>
  <p:identity/>
</p:declare-step>

Regards
Alex

On Mon, May 10, 2010 at 9:16 AM,  <Toman_Vojtech@emc.com> wrote:
>> That's pretty well as far as I had worked out - whilst the memory
> problem
>> I've been seeing is *irritating*, it can't really be described as
> simply
>> wrong (complexly wrong perhaps) because there is nothing simple that
> can
>> be used to determine if a file loaded using p:load can be discarded.
>
> After the static analysis phase, the XProc processor has a pretty good
> picture of what the connections in the pipeline look like and when/where
> results of XProc steps are used (if they are used at all). That
> knowledge alone can be used for various memory optimizations. For
> instance, in a pipeline like this one:
>
> <p:pipeline>
>  <step1/>
>  <step2/>
> </p:pipeline>
>
> the result of step1 can almost certainly be discarded after step2 is has
> finished because there is no other step that refers to the result of
> step1.
>
> Another thing is scoping of steps. By wrapping a step in, for example, a
> p:group, you can very easily restrict the visibility of the results
> produced by the steps in the sub-pipeline:
>
> ...
> <p:group>
>  <step1/>
>  <step2/>
> </p:group>
> ...
>
> In the above example, the results produced by step1 can be discarded
> once p:group has finished because they will be in an inaccessible scope.
>
> And so on. There are many optimizations that XProc processors can do,
> but I think the implementers are just entering this stage after having
> implemented the standard. For instance, EMC's Calumet (which I am
> involved with) does not yet detect when the result of a step is not used
> any more, but it does release the documents when they become out of
> scope.
>
> Regards,
> Vojtech
>
> --
> Vojtech Toman
> Principal Software Engineer
> EMC Corporation
> toman_vojtech@emc.com
> http://developer.emc.com/xmltech
>
>
>



-- 
Alex

An informal recording with one mic under a tree leads to some pretty
sweet acoustic sounds.
https://sites.google.com/site/greigconteh/albums/diabarte-and-sons
Received on Tuesday, 11 May 2010 09:32:01 UTC