RE: Where's the parallelize step?

 
Hi Henry

I think there is a difference between:

 - an XProc processor recognizing that some 
   steps can be run in parallel

   versus

 - a user creating an XML workflow, declaring  
   that "steps A, B, C can be run in parallel 
   with steps D, E, F"


The former is an XProc processor optimization activity. The latter is a user modeling activity.

I think that it is important for a user to be able to explicitly state in an XProc document "These two workflow activities (subpipelines) may be run in parallel." (Whether an XProc processor executes the subpipelines in parallel or serially is an implementation issue.)

/Roger



> -----Original Message-----
> From: Henry S. Thompson [mailto:ht@inf.ed.ac.uk] 
> Sent: Monday, April 20, 2009 8:01 AM
> To: Costello, Roger L.
> Cc: 'xproc-dev@w3.org'
> Subject: Re: Where's the parallelize step?
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Costello, Roger L. writes:
> 
> > Why doesn't XProc have a parallelize step? That is, a step that
> > enables two subpipelines to proceed in parallel. Is there any
> > discussion of adding a parallelize step to XProc?
> 
> If I've understood you correctly, the answer is "because it doesn't
> need one".  The semantics of XProc do not require the evaluation of
> the steps in a pipeline to be any more serialised than their explicit
> dependencies require.  So it's open to implementations to parallelise
> as much as they like/can.
> 
> I have in the past used the following as a sort of _aide memoire_:
> 
>   It should be possible to implement XProc by starting separate
>   threads for _every_ step in the controlling pipeline, and letting
>   input/output/parameter ports control the actual order of execution.
> 
> I believe it is
> 
>   a) still the case that the above will work;
> 
>   b) implicit in the above that if you have multiple processors, you
>      will get parallel execution where the above story allows for it.
> 
> There is at least one case where a smart implementation can
> parallelise which the above would not immediately uncover.  The
> execution of the sub-pipeline of a p:for-each should in principle be
> parallelisable across the different inputs to the p:for-each (provided
> they have no side-effects).
> 
> I can imagine a few extension attributes:
> 
>  1) (boolean)pext:no-side-effects
>  2) (boolean)pext:output-reorder-ok
> 
> The former for any step, the latter for p:for-each, meaning the order
> of the documents in its output sequence need not match that of their
> corresponding input documents.  I'm not actually sure in practice if
> this would help -- it might turn out to be more efficient, as well as
> easier to implement, to require each (presumed independent) thread of
> a parallelised for-each to buffer/suspend its output until the thread
> for the document 'before' its own has completed.  It follows that the
> benefit of parallelisation for p:for-each would be limited unless the
> ratio of computation to output involved was high. . .
> 
> In any case, I think we have all the room we need to explore this
> space w/o any explicit steps, but maybe James's archive search will
> uncover something I've missed. . .
> 
> ht
> - -- 
>        Henry S. Thompson, School of Informatics, University 
> of Edinburgh
>                          Half-time member of W3C Team
>       10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 
> 131 650-4440
>                 Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
>                        URL: http://www.ltg.ed.ac.uk/~ht/
> [mail really from me _always_ has this .sig -- mail without 
> it is forged spam]
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.6 (GNU/Linux)
> 
> iD8DBQFJ7GQDkjnJixAXWBoRAsiAAJ9wVVKgs11t4xIJaEiATrKgzmdUNQCeLPKt
> /uynW6TqjWtJgFKi+C4AVJk=
> =8+Xe
> -----END PGP SIGNATURE-----
> 

Received on Monday, 20 April 2009 12:20:23 UTC