Inputs and outputs from Norman Walsh on 2006-07-20 (public-xml-processing-model-wg@w3.org from July 2006)

From: Norman Walsh <Norman.Walsh@Sun.COM>
Date: Thu, 20 Jul 2006 13:11:36 -0400
To: public-xml-processing-model-wg@w3.org
Message-ID: <87bqrkp5fb.fsf@nwalsh.com>
Here's what I think we've agreed about inputs and outputs.

An input is a binding between some connection point on a component
(let's call it a 'port' to give it a short name) and a document
flowing through the pipeline "towards it" (i.e., one that can be read
from).

For example, an XSLT component needs a stylesheet so there has to be a
binding between the port on the component that expects to
read a stylesheet and some document that will be used as the
stylesheet.

An output is a binding between some port on a component and
a document flowing through the pipeline "away from it" (i.e., one that
can be written to).

For example, an XInclude component needs to write the result of the
inclusion process somewhere so there has to be a binding between the
port where it's going to write and some pipe in the
pipeline where that data can go.

Most components have a fixed set of ports but some, like
pipeline, have an arbitrary number. Nevertheless, an input is a
binding between one of those ports and some pipe where a
document can be read and an output is a binding between one of those
ports and some pipe where a document can be written.

In the body of a pipeline, where documents are flowing between steps,
this is all perfectly straight-forward. All the inputs of each step
are bound to outputs of some earlier step and all the outputs are
bound to the inputs of some later step.

The issue becomes a little bit confusing at the edges, however. The
input on a step reads from somewhere, but the input on a pipeline
doesn't, it provides a port from which some step can read.
Similarly, the pipeline output isn't something that some other step is
expected to read, it's a sink into which data can be poured. I think
the same sort of boundary occurs on the choose element, where we want
the choose to say something about what it produces, but exactly one of
the when elements has to produce it.

Let's try to describe it that way.

  <p:pipeline>
    <!-- accept a document, a schema, and a stylesheet. -->
    <!-- validate, transform, and return the result -->
    <p:declare-input-port port="document"/>
    <p:declare-input-port port="schema"/>
    <p:declare-input-port port="stylesheet"/>
    <p:declare-output-port port="result"/>

    <p:step kind="validate">
      <p:input port="document"/>
      <p:input port="schema"/>
      <p:output port="result"/>
    </p:step>

    <p:step kind="xslt">
      <p:input port="document"/>
      <p:input port="stylesheet"/>
      <p:output port="result"/>
    </p:step>
  <p:pipeline>

That's all the fittings, now the question is, how can we connect the
pipes together. In principle, as long as all the inputs and outputs
are joined together, it doesn't matter. On the one hand, I'd be
inclined to try to make it uniform (all inputs read from outputs or
all outputs write to inputs), but since we have this problem at the
edges, let's try to relax a little.

Let's say that inputs can declare where they read from and outputs can
declare where they write to. Since pipes only have two ends, it's
clearly unnecessary to specify both, but let's say you can if you want
to. The only constraint is that you can't provide conflicting
junctions.

With that in mind, we could do it like this:

  <p:pipeline>
    <!-- accept a document, a schema, and a stylesheet. -->
    <!-- validate, transform, and return the result -->
    <p:declare-input-port port="document" name="xmlfile"/>
    <p:declare-input-port port="schema" name="xsdfile"/>
    <p:declare-input-port port="stylesheet" name="xslfile"/>
    <p:declare-output-port port="result" name="output"/>

    <p:step kind="validate">
      <p:input port="document" from="xmlfile"/>
      <p:input port="schema" from="xsdfile"/>
      <p:output port="result" name="validxml"/>
    </p:step>

    <p:step kind="xslt">
      <p:input port="document" from="validxml"/>
      <p:input port="stylesheet" from="xslfile"/>
      <p:output port="result" to="output"/>
    </p:step>
  <p:pipeline>

Or we could say:

  <p:pipeline>
    <!-- accept a document, a schema, and a stylesheet. -->
    <!-- validate, transform, and return the result -->
    <p:declare-input-port port="document" name="xmlfile"/>
    <p:declare-input-port port="schema" name="xsdfile"/>
    <p:declare-input-port port="stylesheet" name="xslfile"/>
    <p:declare-output-port port="result" name="output"/>

    <p:step kind="validate">
      <p:input port="document" from="xmlfile"/>
      <p:input port="schema" from="xsdfile"/>
      <p:output port="result" to="styler"/>
    </p:step>

    <p:step kind="xslt">
      <p:input port="document"/>
      <p:input port="stylesheet" from="xslfile"/>
      <p:output port="result" to="output"/>
    </p:step>
  <p:pipeline>

Or we could do the naming "Richard's way":

  <p:pipeline name="pipe">
    <!-- accept a document, a schema, and a stylesheet. -->
    <!-- validate, transform, and return the result -->
    <p:declare-input-port port="document"/>
    <p:declare-input-port port="schema"/>
    <p:declare-input-port port="stylesheet"/>
    <p:declare-output-port port="result"/>

    <p:step kind="validate" name="validate">
      <p:input port="document" from="pipe.document"/>
      <p:input port="schema" from="pipe.schema"/>
      <p:output port="result"/>
    </p:step>

    <p:step kind="xslt" name="transform">
      <p:input port="document" from="validate.result"/>
      <p:input port="stylesheet" from="pipe.stylesheet"/>
      <p:output port="result" to="pipe.result"/>
    </p:step>
  <p:pipeline>

They all amount to the same thing. We could even allow them to be
mixed. Whether we mandate one of these forms or allow all of them
is a seperable question.

I think this also helps us with the choose statement:

  <p:pipeline>
    <!-- accept a document, a schema, and a stylesheet. -->
    <!-- validate, transform, and return the result -->
    <p:declare-input-port port="document" name="xmlfile"/>
    <p:declare-input-port port="schema" name="xsdfile"/>
    <p:declare-input-port port="stylesheet" name="xslfile"/>
    <p:declare-output-port port="result" name="output"/>

    <p:step kind="validate">
      <p:input port="document" from="xmlfile"/>
      <p:input port="schema" from="xsdfile"/>
      <p:output port="result" name="validxml"/>
    </p:step>

    <p:choose>
      <p:declare-input-port port="testdocument"/>
      <p:declare-output-port port="result" name="xformed"/>

      <p:input port="testdocument" from="validxml"/>

      <p:when test="/book">
        <p:step kind="xslt">
          <p:input port="document" from="validxml"/>
          <p:input port="stylesheet" href="docbook.xsl"/>
          <p:output port="result" to="xformed"/>
        </p:step>
      </p:when>

      <p:when test="/html">
        <p:step kind="xslt">
          <p:input port="document" from="validxml"/>
          <p:input port="stylesheet" href="html.xsl"/>
          <p:output port="result" to="xformed"/>
        </p:step>
      </p:when>
    </p:choose>

    <p:step kind="identity">
      <p:input port="document" from="xformed"/>
      <p:output port="result" to="output"/>
    </p:step>
  <p:pipeline>

The choose begins by declaring its input and output ports, then it
declares the inputs bound to its input ports. This looks a little odd
and it might make more sense to allow 'from' on p:declare-input-port
as a sort of "declare-and-bind-in-one-step".

    <p:choose>
      <p:declare-input-port port="testdocument" from="validxml"/>
      <p:declare-output-port port="result" name="xformed"/>

In any event, the semantics of choose are that the test expressions on
each when statement are performed against the input document supplied
on the "testdocument" port. In fact, we could generalize this a little
bit and allow each when to operate over a different document, I
suppose.

Anyway, the important bits are:

 1. Inside the p:choose, the only input ports available are the ones
    locally declared. This makes p:choose a wholly self-contained
    element which I really like.

 2. If there's any p:when that does not have a binding to all of the
    declared output-ports, that's a static error.

I don't think I've said very much that's new, but just changing the
names to "declare-input-port" and "declare-output-port" has really
made it a lot clearer in my mind. (Much clearer than input/with-input,
though I think the concept was the same.)

Also, while I'm not a huge fan of the name "port", by using it I have
been able to reserve the attribute "name" exclusively for names that
authors invent in order to point at them which I think will make
Murray and Alex happy.

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
XML Standards Architect
Sun Microsystems, Inc.
Received on Thursday, 20 July 2006 17:12:03 UTC