W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > October 2007

MSFT presentation, feedback

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Mon, 01 Oct 2007 15:42:34 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <f5b1wcerj51.fsf@hildegard.inf.ed.ac.uk>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I gave a presentation on the Last Call WD to some of the XML people at
MSFT on Friday, and got a pleasantly positive reception.

One specific question they were interested in, looking towards very
large scale data processing with parallel hardware available, was
whether we supported Google 'map-reduce' style decomposition.  I
mentioned the inherent parallelisability of the overall architecture,
but realised we did not have anything which would directly support
such decomposition.  Maybe we should consider it. . .

We already have 'map' -- it's just for-each with a select pattern on
its input.

Here's an example of how it could be used along with a new 'reduce'
construct:

Stipulate we have a pipeline which can construct an index for a book
chapter.  Here's how we index the whole book:

 <for-each select='//chapter'>
  [compute index]
 </for-each>

 <reduce name='r'>
  <input port="seed">
   <inline>
    </bookIndex>
   </inline>
  </input>

  <merge-two-indices>
   <input port='book'>
    <pipe port='seed' step='r'/>
   </input>
  </merge-two-indices>

 </reduce>

where merge-two-indices has two inputs, primary a chapter index and
secondary a book index, and one output, a new book index merging in
the chapter index.

reduce takes a primary sequence input and a secondary single input
(the seed) and a subpipeline.  It runs the subpipeline repeatedly,
supplying each member of the sequence in turn as the default input and
first time the seed, and subsequent times the output of the previous
round, as the 'seed' input.  Output is the output of the subpipeline
- From the last iteration.

Such a construct would give us a way of addressing our current lack of
open-ended/runtime input/output cardinality.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFHAQdakjnJixAXWBoRAnJYAKCDkSfBsgU9xT0Ak7kT0pcVD40F4ACeOOkO
tG37Mw79t+Xh0yumEuS7trk=
=MLFb
-----END PGP SIGNATURE-----
Received on Monday, 1 October 2007 14:42:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:54 GMT