RE: Execution order of steps in a pipeline from Toman_Vojtech@emc.com on 2008-01-24 (public-xml-processing-model-comments@w3.org from January 2008)

From: <Toman_Vojtech@emc.com>
Date: Thu, 24 Jan 2008 08:25:32 -0500
To: <public-xml-processing-model-comments@w3.org>
Message-ID: <6E216CCE0679B5489A61125D0EFEC787097F9D02@CORPUSMX10A.corp.emc.com>

> > I am sure this has been asked before (even though I could find any 
> > discussion about this topic), but is this really necessary? Why
cannot 
> > the execution of the contained steps just follow the document order?
> 
> How can you tell?  Are you hoping to depend on side-effects 
> happening in a particular order? That would require not only 
> that step execution _began_ in document order, but that no 
> step began execution before all others 'before' it had 
> finished.  I sure don't want to go there, it rules out streaming.
> 
> I don't mind changing that sentence, but only if we make it _weaker_.
> 
> I persist in believing the spec. ought to support a 
> simplistic implementation which assigns a separate thread to 
> every step, and starts them all running, letting the 
> sequencing of execution at all levels depend entirely on 
> availability of input (and of output buffering).  I think it 
> does so now.

I didn't mean that the steps should be executed sequentially, after the
previous steps have finished. For me a pipeline is a black box and the
contained steps can be executed in any order (for example as separate
threads that start as soon as input data is available).

What I wanted to say is that no matter in which implementation-specific
order the steps are executed, the end result of a pipeline should be the
same as running the steps sequentially, in document order.

So, if you have a pipeline like this:

<p:pipeline type="pip">
  <p:output port="result1">
    <p:pipe step="id1" port="result"/>
  </p:output>
  <p:output port="result2">
    <p:pipe step="id2" port="result"/>
  </p:output>
  <p:output port="result3">
    <p:pipe step="id3" port="result"/>
  </p:output>
  <p:output port="result4">
    <p:pipe step="id4" port="result"/>
  </p:output>

  <p:identity name="id1">
    <p:input port="source">
      <p:pipe step="pip" port="source"/>
    </p:input>
  </p:identity>
  <p:identity name="id2">
    <p:input port="source">
      <p:pipe step="pip" port="source"/>
    </p:input>
  </p:identity>
  <p:identity name="id3">
    <p:input port="source">
      <p:pipe step="pip" port="source"/>
    </p:input>
  </p:identity>
  <p:identity name="id4">
    <p:input port="source">
      <p:pipe step="pip" port="source"/>
    </p:input>
  </p:identity>
<p:pipeline>

I think the XProc processor can still run the identity steps in parallel
if it can detect that there is no dependenty between them.

If you rely on default bindings of inputs and outputs, in theory you can
always execute the steps in document order, which is what I expect when
I see a pipeline like this:

<p:pipeline>
  <p:identity/>
  <p:identity/>
  <p:identity/>
</p:pipeline>

But the specification also allows "forward" bindings to steps that
follow in document order:

<p:pipeline>
  <p:identity name="id1">
    <p:input port="source">
      <p:pipe step="id2" port="result"/>
    </p:input>
  <p:identity name="id2"/>
</p:pipeline>

Which is something I sort of don't like because I find it confusing and
not natural. But again, maybe there are use cases where we may need this
- and from that perspective, it makes sense to be prepared for it...
It's just that I like the simple, linear nature of pipelines much more.

Vojtech

--
Vojtech Toman
Principal Software Engineer
EMC Corporation

Aert van Nesstraat 45
3012 CA Rotterdam
The Netherlands

Toman_Vojtech@emc.com

Received on Thursday, 24 January 2008 13:22:00 UTC