Re: Constant Inputs during Iteration

Norm,

On 10/12/06, Norman Walsh <Norman.Walsh@sun.com> wrote:
> If we say that a URI must be cached locally (this is independent of
> protocol and any protocol-based caching; I may be confused about what
> sort of caching you were describing above), then it doesn't matter when
> a URI is retrieved.
>
> If we don't say that it's cached, then I think we'll have to say that
> it must not be cached. Having said that, in order to have any sort of
> interoperability, I think we'll have to have a pretty detailed story
> about execution order.
>
> On balance, I'm strongly in favor of the former.

One downside of mandating that URI must be cached is that it will make
the execution of some pipelines inefficient or even impossible. For
instance consider a pipeline that extracts information from a large
XML file stored on disk, doing this in a 2-phase process, using 2
processors that can both stream their input:

<p:step name="extract-1" type="p:extract-1">
    <p:input name="document" href="large-file.xml"/>
</p:step>

<p:step name="extract-2" type="p:extract-2">
    <p:input name="document" href="large-file.xml"/>
</p:step>

In this case, it most likely does not make sense for the pipeline
engine to cache large-file.xml between the execution of p:extract-1
and p:extract-2.

For this reason in particular, I am not in favor of forcing the
implementation to always cache URIs. I see two options. The simplest
it to consider that caching is outside of the scope of this
specification. Alternatively we can say, as Erik suggested, that the
implementation can (but does not have to) do caching if it has in
place a mechanism to guarantee that the document has not changed.

Alex
-- 
Blog (XML, Web apps, Open Source):
http://www.orbeon.com/blog/

Received on Friday, 13 October 2006 18:56:54 UTC