Re: Where's the parallelize step? from Henry S. Thompson on 2009-04-20 (xproc-dev@w3.org from April 2009)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Mon, 20 Apr 2009 13:01:07 +0100
To: "Costello, Roger L." <costello@mitre.org>
Cc: "'xproc-dev@w3.org'" <xproc-dev@w3.org>
Message-ID: <f5b8wlvh6v0.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Costello, Roger L. writes:

> Why doesn't XProc have a parallelize step? That is, a step that
> enables two subpipelines to proceed in parallel. Is there any
> discussion of adding a parallelize step to XProc?

If I've understood you correctly, the answer is "because it doesn't
need one".  The semantics of XProc do not require the evaluation of
the steps in a pipeline to be any more serialised than their explicit
dependencies require.  So it's open to implementations to parallelise
as much as they like/can.

I have in the past used the following as a sort of _aide memoire_:

  It should be possible to implement XProc by starting separate
  threads for _every_ step in the controlling pipeline, and letting
  input/output/parameter ports control the actual order of execution.

I believe it is

  a) still the case that the above will work;

  b) implicit in the above that if you have multiple processors, you
     will get parallel execution where the above story allows for it.

There is at least one case where a smart implementation can
parallelise which the above would not immediately uncover.  The
execution of the sub-pipeline of a p:for-each should in principle be
parallelisable across the different inputs to the p:for-each (provided
they have no side-effects).

I can imagine a few extension attributes:

 1) (boolean)pext:no-side-effects
 2) (boolean)pext:output-reorder-ok

The former for any step, the latter for p:for-each, meaning the order
of the documents in its output sequence need not match that of their
corresponding input documents.  I'm not actually sure in practice if
this would help -- it might turn out to be more efficient, as well as
easier to implement, to require each (presumed independent) thread of
a parallelised for-each to buffer/suspend its output until the thread
for the document 'before' its own has completed.  It follows that the
benefit of parallelisation for p:for-each would be limited unless the
ratio of computation to output involved was high. . .

In any case, I think we have all the room we need to explore this
space w/o any explicit steps, but maybe James's archive search will
uncover something I've missed. . .

ht
- -- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
                         Half-time member of W3C Team
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 651-1426, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFJ7GQDkjnJixAXWBoRAsiAAJ9wVVKgs11t4xIJaEiATrKgzmdUNQCeLPKt
/uynW6TqjWtJgFKi+C4AVJk=
=8+Xe
-----END PGP SIGNATURE-----

Received on Monday, 20 April 2009 12:01:46 UTC