Taking good care of the environment . . . from Henry S. Thompson on 2007-03-21 (public-xml-processing-model-wg@w3.org from March 2007)

From: Henry S. Thompson <ht@inf.ed.ac.uk>
Date: Wed, 21 Mar 2007 15:54:42 +0000
To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
Message-ID: <f5bodmmegn1.fsf@hildegard.inf.ed.ac.uk>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've been building a sort of abstract implementation of phase 1 of an
XProc implementation, in the form of a pipeline (:-) of XSLT steps
which transform an XProc document into an abstract data model instance
(notated in RDF).  In doing so, I realise that the environment
construction rules I've followed are subtly different from those in
the spec.

Here's a picture of the relevant part of my take on Example 1
(repeated here for convenience, with an error corrected, by giving it
a name):

<p:pipeline xmlns:p="http://www.w3.org/2007/03/xproc" name="fig1">
  <p:input port="source" sequence="no"/>
  <p:input port="schemaDoc" sequence="yes"/>
  <p:output port="result" sequence="no"/>

  <p:xinclude name="s1">
    <p:input port="source">
      <p:pipe step="fig1" port="source"/>
    </p:input>
  </p:xinclude>

  <p:validate name="s2">
    <p:input port="schema">
      <p:pipe step="fig1" port="schemaDoc"/>
    </p:input>
  </p:validate>
</p:pipeline>

There are five actual input ports, and 3 actual output ports, in this
pipeline, which I'll name as follows:

  i.fig1.doc*
  i.fig1.schemaDoc*
  i.s1.source
  i.s2.source
  i.s2.schema

  o.fig1.result*
  o.s1.result
  o.s2.result

  (the ports marked '*' are schizophrenic -- both input and output, in
   a sense)

Here's what the environments should be, in my opinion
(defaultReadablePort is the *bold* entry in readablePorts, if any):

  Pipeline              readablePorts: {}

   Subpipeline          readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        o.s1.result, o.s2.result}

    s1                  readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        o.s1.result, o.s2.result}

    s2                  readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        *o.s1.result*, o.s2.result}

This feels better to me than what we have in the spec. today:

  Pipeline              readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        o.s1.result, o.s2.result}

   Subpipeline          readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        o.s1.result, o.s2.result}

    s1                  readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        o.s1.result, o.s2.result}

    s2                  readablePorts: {i.fig1.doc, i.fig1.schemaDoc,
                                        *o.s1.result*, o.s2.result}

Connections:

   i.fig1.doc -> i.s1.source
   o.s1.result -> i.s2.source
   i.fig1.schemaDoc -> i.s2.schema
   o.s2.result -> o.fig1.result  

Of these, the first and third are explicit, the second and fourth are
defaults.

To see why the environment story from the spec. is a bit odd, consider
the situation for Example 5, put in a pipeline:

<p:pipeline name="ex5">
  <p:input port="source"/>
  <p:output port="result"/>
  <p:viewport name="v1" match="h:div[@class='chapter']">
    <p:output port="result"/>
    <p:insert name="i1">
      <p:input port="insertion">
        <p:inline>
          <hr xmlns="http://www.w3.org/1999/xhtml"/>
        </p:inline>
      </p:input>
      <p:option name="at-start" value="true"/>
    </p:insert>
  </p:viewport>
</p:pipeline>

Input ports:

 i.ex5.source*
 i.v1.viewportSource
 i.i1.source
 i.i1.insertion

Output ports:

 o.ex5.result*
 o.v1.current
 o.v1.result*
 o.i1.result

Connections:

 i.ex5.source -> i.v1.viewportSource
 o.v1.current -> i.i1.source 
 o.i1.result -> o.v1.result
 o.v1.result -> o.ex5.result

All these connections are defaulted

Here's what I think the environment story should be:

  Pipeline              readablePorts: {}

   Subpipeline          readablePorts: {i.ex5.source*, o.v1.result}

     v1                 readablePorts: {i.ex5.source*, o.v1.result}

       Subpipeline      readablePorts: {i.ex5.source, o.v1.result,
                                        o.v1.current*, o.i1.result}

         i1             readablePorts: {i.ex5.source, o.v1.result,
                                        o.v1.current*, o.i1.result}

Here's what I read the spec. as currently saying it is:

  Pipeline              readablePorts: {i.ex5.source*, o.v1.result}

   Subpipeline          readablePorts: {i.ex5.source*, o.v1.result}

     v1                 readablePorts: {i.ex5.source, o.v1.result,
                                        o.i1.result}

       Subpipeline      readablePorts: {i.ex5.source, o.v1.result,
                                        o.v1.current*, o.i1.result}

         i1             readablePorts: {i.ex5.source, o.v1.result,
                                        o.v1.current, o.i1.result}

There are two bugs visible here -- the 'standard modifications' are
deleting the defaultReadablePort too eagerly for the first step in
each subpipeline, and the outputs of the contained steps are being
introduced into the environment too soon.

The first bug can be corrected by treating first position specially in
the 'standard modifications'.  The second may not seem to matter much,
but it means we can't just say that a step's inputs can be bound to
any of its (non-self-or-sibling) readablePorts, because that would
appear to allow the viewport to bind its input to its own contained
step's output (because o.i1.result is a readablePort for it).

I think the updating of the environment needs to be adjusted just a
bit to fix this, I'll send a proposal in a subsequent email.

ht
- -- 
 Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
                     Half-time member of W3C Team
    2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
            Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                   URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)

iD8DBQFGAVVDkjnJixAXWBoRAg+uAJ0eDC8XIgjZPKM9s4sa10YtKRpn1wCdGRd8
EfdWKnAe0vnzm4IisOPv12U=
=R8n9
-----END PGP SIGNATURE-----
Received on Wednesday, 21 March 2007 15:54:47 UTC