RE: How to add more transports/protocols to XProc? from vojtech.toman@emc.com on 2011-03-15 (xproc-dev@w3.org from March 2011)

From: <vojtech.toman@emc.com>
Date: Tue, 15 Mar 2011 05:06:46 -0400
To: <xproc-dev@w3.org>
Message-ID: <3799D0FD120AD940B731A37E36DAF3FE334C19BFF4@MX20A.corp.emc.com>
If you look at the rules for building the dependency graph, the XProc specification actually does not say when the steps must be executed. This is deliberate to let the XProc implementers decide the implementation strategy that they find the most appropriate. XProc intentionally doesn't define how "A depends on B" affects the actual execution order of A and B. The only important thing is that the results are consistent with the connections between the steps. As long as this is true, the steps A and B can be executed in any order:

XProc spec, section 2:
"
The result of evaluating a pipeline (or subpipeline) is the result of evaluating the steps that it contains, in an order consistent with the connections between them. A pipeline must behave as if it evaluated each step each time it is encountered. Unless otherwise indicated, implementations must not assume that steps are functional (that is, that their outputs depend only on their inputs, options, and parameters) or side-effect free.
The pattern of connections between steps will not always completely determine their order of evaluation. The evaluation order of steps not connected to one another is implementation-dependent.
"

Because what does it actually mean when you say that "B depends on A"? In standard XProc, there is only one type of dependency: step B is connected to A (= B reads the result of A). There is no way you can say: "B must be executed after A" or even: "B must not start before A" in standard XProc.

In fact, when "B is connected to A", you cannot make any assumptions when the steps A and B get executed. In a typical single-threaded XProc implementation, B will most likely be executed *after* A, but in a multi-threaded implementation, B may be executed *at the same time* as A or its execution may even start *before* A! In a multi-threaded implementation, it is just a matter of proper synchronization to make sure that the step blocks when it needs the output of A. In fact, any other execution strategy will work, no matter how exotic, as long as it produces consistent results.

Then there are also situations when the dependency graph of a pipeline contains multiple branches, or even multiple connected components. In that case, the XProc processor is free to decide when to execute the individual branches (or components) without this decision having any impact on the total result of the pipeline. If you have a pipeline with three steps A, B, C where:

- B is connected to A; and
- C is connected to A

It really does not matter if B is executed before or after C, or at the same time.

Similarly, if you have a pipeline that contains two steps A and B that don't depend on each other, the two steps can be executed in any order.

People tend to think sequentially, and they quite naturally assume that the processor will execute the steps in the order they specified in the pipeline. But as long as the XProc implementation produces the correct results, it can do it in any way that is consistent with the rules defined in the XProc specification. It can use any optimization techniques to determine the optimal ordering of the dependency graph, the number of threads to use etc. This is an advantage of XProc, not a drawback.

(The above discussion is also the main argument why I don't like people using XProc as a "workflow" tool. In a workflow process, you typically want to have more control when the individual workflow tasks are executed. Plus you often also want features like forking and parallel branches, asynchronous events etc. While you probably can graft this onto XProc with some effort, it does not really align with the "spirit" of XProc, which, in my view, is primarily a data flow processing language.

Regards,
Vojtech


--
Vojtech Toman
Consultant Software Engineer
EMC | Information Intelligence Group
vojtech.toman@emc.com
http://developer.emc.com/xmltech

From: Alex Muir [mailto:alex.g.muir@gmail.com]
Sent: Monday, March 14, 2011 5:20 PM
To: Toman, Vojtech
Cc: xproc-dev@w3.org
Subject: Re: How to add more transports/protocols to XProc?

"In some situations it doesn't matter, but you typically want to prevent the XProc processor from "randomly" deciding the execution order of the steps in the pipeline."

Interesting..

What is the benefit to allow XProc processor to randomly decide the execution order of the steps and when does that happen in practice?

I read previously http://norman.walsh.name/2009/03/26/xprocWithXProc which states in a section that "But if you look at the store and exec steps in the preceding pipeline, you'll see that there are no connections between them. Effectively, we have a pipeline with two independent sub-pipelines. ... As a result, there's no dependency between the store and exec steps. The pipeline processor can execute them in any order, even in parallel. But the correct result requires that the p:store step be executed before p:exec."

I wonder why one wouldn't generally want the steps to execute one after the other unless specified otherwise in someway by the script to for example execute in parallel or to execute and not wait?


Regards
Alex
Received on Tuesday, 15 March 2011 09:15:18 UTC