RE: How to run unconnected steps in sequence? from Philip Fennell on 2010-11-01 (xproc-dev@w3.org from November 2010)

From: Philip Fennell <Philip.Fennell@marklogic.com>
Date: Mon, 1 Nov 2010 07:12:58 -0700
To: David <dlee@calldei.com>, "vojtech.toman@emc.com" <vojtech.toman@emc.com>
CC: "xproc-dev@w3.org" <xproc-dev@w3.org>
Message-ID: <D20C296D14127D4EBD176AD949D8A75A46D84A7D@EXCHG-BE.marklogic.com>
For my money, I like the first example:

<p:store href=”file.xml” name=”store”/>
<p:load>
  <p:with-option name=”href” select=”’file.xml’”>
    <p:pipe step=”store” port=”result”/>
  </p:with-option>
</p:load>

That is quite straight forward and concise. Interestingly though, this is quite a ‘friendly’ example in that there is an implicit dependency because the pipeline is supposed to be loading the file that the previous step saved and the URI created by the first step is available to the second via its non-primary output port.

Now, if you were looking for a means of explicitly defining that the two steps were to run sequentially then, as I’ve suggested previously, I believe you could build an XProc processor that supported the use of SMIL Timesheets to indicate whether steps should execute either sequentially or in parallel:

<?xml version="1.0" encoding="UTF-8"?>
<p:pipeline
    xmlns:p="http://www.w3.org/ns/xproc"
    name="load-and-save"
    version="1.0">

  <p:pipeinfo>
    <timesheet xmlns="http://www.w3.org/ns/SMIL30">
      <seq>
        <item select="#save"/>
        <item select="#load"/>
      </seq>
    </timesheet>
  </p:pipeinfo>


  <p:store xml:id="save" href="file.xml"/>

  <p:load xml:id="load" href="file.xml"/>

</p:pipeline>


By using SMIL Timesheets you wouldn’t have to extend the XProc grammar to support explicit step orchestration. It would be a very good example of re-using existing W3C technologies to extend others. I’m currently working upon a way to demonstrate how the timesheets could be interpreted without having to actually go to the trouble of implementing an entire working processor. It’ll either illustrate what is possible or prove I’m a dreamer ;-)


Regards

Philip Fennell
Consultant
MarkLogic Corporation

88 Wood Street, London. EC2V 7RS

Mobile: +44 (0) 7824 830 866

email  Philip.Fennell@marklogic.com<mailto:Firstname.Lastname@marklogic.com>
web    www.marklogic.com<http://www.marklogic.com/>




From: xproc-dev-request@w3.org [mailto:xproc-dev-request@w3.org] On Behalf Of David
Sent: 01 November, 2010 1:42 PM
To: vojtech.toman@emc.com
Cc: xproc-dev@w3.org
Subject: Re: How to run unconnected steps in sequence?

I don't believe this is strictly true.
A pipe creates a dependency of the start of the output of A to the beginning of the input of B.
It does  not to my knowledge create a dependency that A *completes execution* before B begins.
Or in fact that they don't start in any order but B is waiting when it starts to read the data from the pipe.
Atleast that was my understanding last time I read the spec (could well be wrong).
Also I know in Calabash (when I last looked at the code) the processing is single threaded, thus data dependency == completion dependency, but I do not believe its guaranteed to behave that way.
But I believe a conforming processor could implement steps in a pipe asynchronously as long as the data flowing between them is synchronized.


For an analogy, in the unix pipeline (and also in xmlsh)
    a | b

The process (or in xmlsh's case; thread)  "b" is (or may be, depending on the implementation) started first.
If b is not consuming data from its input it may actually run to completion before "a" even starts.


But then again ... maybe I've misread the specs.





David A. Lee

dlee@calldei.com<mailto:dlee@calldei.com>

http://www.xmlsh.org


On 11/1/2010 9:04 AM, vojtech.toman@emc.com<mailto:vojtech.toman@emc.com> wrote:
Ad 1. The p:pipe element creates a connection – and a dependency – between steps. If the step A contains a p:pipe that points to the step B, it means that A must be executed *after* B. It does not matter if the p:pipe is in p:input, p:with-option, p:with-param, or p:variable. All p:pipe elements contained in the step contribute edges to the dependency graph.

Ad 2: There is no guarantee; but you don’t care in this case. The p:load does not depend on p:sink being executed first. I used p:sink just to introduce a side effect-free dependency on p:store. (You can also use other steps than p:sink, provided they don’t introduce side effects that would change the result of your pipeline.)

Regards,
Vojtech


--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com<mailto:vojtech.toman@emc.com>
http://developer.emc.com/xmltech


From: Jostein Austvik Jacobsen [mailto:josteinaj@gmail.com]
Sent: Monday, November 01, 2010 1:46 PM
To: Toman, Vojtech
Cc: xproc-dev@w3.org<mailto:xproc-dev@w3.org>
Subject: Re: How to run unconnected steps in sequence?

1. Thanks. I didn't know dependencies could be introduced that way. Is this similar to what happens when you have multiple p:pipes in a p:input?

2. and 3.: How am I guaranteed that p:load won't run before the p:sink?

Regards
Jostein

2010/11/1 <vojtech.toman@emc.com<mailto:vojtech.toman@emc.com>>
The solution is to introduce the dependency explicitly. Here are some examples (all are variations on the same theme, but some may be more applicable to your use case):

1.
<p:store href=”file.xml” name=”store”/>
<p:load>
  <p:with-option name=”href” select=”’file.xml’”>
    <p:pipe step=”store” port=”result”/>
  </p:with-option>
</p:load>

2.
<p:store href=”file.xml” name=”store”/>
<p:group>
  <p:sink>
    <p:input port=”source”>
      <p:pipe step=”store” port=”result”/>
    </p:input>
  </p:sink>
  <p:load href=”file.xml”/>
</p:group>

3.
<p:group>
  <p:store href=”file.xml”/>
  <p:identity>
    <p:input port="source">
      <p:empty/>
    </p:input>
  </p:identity>
<p:group>
<p:group>
  <p:sink/>
  <p:load href=”file.xml”/>
</p:group>

Some processors also support extension attributes to control dependencies between steps, but I would recommend to avoid this unless absolutely necessary.

Regards,
Vojtech

--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com<mailto:vojtech.toman@emc.com>
http://developer.emc.com/xmltech


From: xproc-dev-request@w3.org<mailto:xproc-dev-request@w3.org> [mailto:xproc-dev-request@w3.org<mailto:xproc-dev-request@w3.org>] On Behalf Of Jostein Austvik Jacobsen
Sent: Monday, November 01, 2010 1:01 PM
To: xproc-dev@w3.org<mailto:xproc-dev@w3.org>
Subject: How to run unconnected steps in sequence?

I remember seeing a note on this problem somewhere, but I can't find it. Say I want to run these two steps in sequence:

<p:store href="file.xml"/>
<p:load href="file.xml"/>

p:load would have to run after p:store, or the file wouldn't be there yet. Since p:store has no primary output and p:load has no primary input, the processor may choose the order they are run in.

Is there a standard pattern for solving such issues? Something general, not just for the store/load use-case?


Regards
Jostein Austvik Jacobsen
Received on Monday, 1 November 2010 14:13:30 UTC