the nested view and the flat / subgraph view (re-send, with standoff images) from C. M. Sperberg-McQueen on 2006-09-14 (public-xml-processing-model-wg@w3.org from September 2006)

From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
Date: Thu, 14 Sep 2006 16:15:16 -0600
To: public-xml-processing-model-wg@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Message-Id: <29E5A5E1-F943-4197-806F-0E2F6972F8C2@acm.org>
[The first attempt to send this mail to the list failed because
the size exceeded the limit imposed by the list configuration.
So I've removed the embedded images and replaced them with URLs
to the images.

Those with member access to the W3C site can if they prefer read
a version of this email with the images embedded in it, at
http://lists.w3.org/Archives/Member/w3c-archive/2006Sep/0071.html
but I discover on inspection that it doesn't embed the images
at the point of reference.]



During the call today, Norm and Henry expressed the hope that
Alex Milowski and I could send some email trying to make clearer
the distinction we were drawing between two views of pipelines
which I'll here call the 'nested' and the 'flat' views.

In the nested view, a pipeline is a digraph whose nodes are
components.  Period.  All components are nodes, all nodes are
components.  From the point of view of the pipeline, each node is
atomic.  The internal structure of compound components is NOT
visible in the graph which constitutes the pipeline.  Instead,
such a compound component contains, or is, a second graph, quite
distinct from the other.

Drawn in this way, the pipeline in Figure 2 of the draft spec
looks as shown in the diagram

   http://www.w3.org/XML/2006/09/nesting-01a.png

That is, the main pipeline has two components, C1 and an XSLT
component.  The Validate components (here mislabeled 'Process',
sorry about that) are not nodes in the main pipeline, they are
nodes in a different pipeline named C1.

One advantage of this view is that it makes the recursive nature
of our pipeline specification language easy to talk about.  One
disadvantage is that it provides no concept in which both the
XSLT component and the two validation components in this pipeline
appear in the same picture.  Any discussion that needs to talk
about all three of the atomic components here may spend a lot of
time shifting gears and switching from one level to the other.

In the flat view, a pipeline is a digraph whose nodes are
components.  Period.  All nodes are components, all components
are nodes.  From the point of view of the pipeline, and indeed
from every point of view, each node in the graph and each
component is atomic.Step containers like choose are not atomic,
and hence they are not nodes in the graph, and (thus) they are
not components.  That doesn't mean they have disappeared, but
only that they correspond to subgraphs in the pipeline.

The pipeline in Figure 2 looks like

   http://www.w3.org/XML/2006/09/subgraph-01.png

when drawn in this style.  Unlike the nested style, this
represents the pipeline completely in a single graph, not a set
of graphs whose interrelations are not captured by any facts
about the graph(s).

Note that the internal structure of compound components is
visible in the main pipeline graph.  The compound component C1 is
also visible, but as a subgraph (surrounded here by a dotted
line) and not as a node.

The pipeline shown in section 4.1.3 of the draft we discussed
today may provide a useful second illustration.  For the benefit
of readers without the draft in front of them, here is one
version of that pipeline, without explicit declarations of ports
on the choose:

<p:pipeline xmlns:p="http://www.w3.org/2006/08/pipeline">
<p:declare-input port="document"/>
<p:declare-parameter name="makeHTML" required="yes"/>

<!-- for the sake of convenience, we assume these steps take no
      inputs and produce a single output on a port named "result" -->
<p:step name="gen-fo" component="ex:generate-fo-stylesheet"/>
<p:step name="gen-html" component="ex:generate-html-stylesheet"/>

<p:choose name="choose-result">
   <p:when test="$makeHTML = '1'">
     <p:step name="makeHTML" component="p:xslt">
       <p:input port="document" source="!document"/>
       <p:input port="stylesheet" source="gen-html!result"/>
     </p:step>
     <p:step name="writeHTML" component="p:serialize">
       <p:input port="document" source="makeHTML!result"/>
     </p:step>
   </p:when>

   <p:otherwise>
     <p:step name="makeFO" component="p:xslt">
       <p:input port="document" source="!document"/>
       <p:input port="stylesheet" source="gen-fo!result"/>
     </p:step>
     <p:step name="writePDF" component="p:fo-to-pdf">
       <p:input port="document" source="makeFO!result"/>
     </p:step>
   </p:otherwise>
</p:choose>

</p:pipeline>

Here is the nested view: the outer pipeline is a collection of
four graphs:

   http://www.w3.org/XML/2006/09/nest.413.png

Here I've drawn the inputs and outputs of each pipeline as
document shapes (or ovals, for parameters) straddling the box
which outlines the pipeline's graph.  None of these subpipelines
actually seems to have any output, though.  (And the port names
declared aren't visible in the version of the pipeline reproduced
above.)

In the flat view, each subpipeline becomes a subgraph (here,
again, outlined by dotted lines).  I've put document shapes on
the data flow lines here, in an attempt to make it easier to
associate data-flow lines in the diagram with named ports in the
written form of the pipeline -- I'm not sure of the best way to
draw these things, so bear with me if the representation is
clunky in some ways.

   http://www.w3.org/XML/2006/09/flat.413.png

It should be clear, on reasonably careful examination, that
either of these ways of structuring the graph(s) can be built
from the other, by processes that cry out for names like
'encapsulation' and 'in-lining'.

I believe Alex's experience is that when actually constructing
the executable pipeline, one often starts by constructing, from
the XML, something like the nested view, but then in order to
manage the execution one needs to build from that something more
like the flat view, because it is in the flat view that you see
more conveniently how many threads are in any particular flow.
"The longest chain is this one, so that's the one I need to
optimize."

Myself, I don't believe I have a dog in this fight.  All I want
is for the spec, and our discussions, to be clear.  And the
definition of pipeline in the current spec as a digraph whose
nodes are components makes me think very strongly of the flat
view, not the nested view.  (This is so even though I agree that
there is some sense in which the nested view satisfies the
definition.  My observation is not that one is necessarily
inaccurate -- it rather depends on how we end up defining
'component' -- but that if the nested view is the one we mean,
then just saying a pipeline is a digraph of components is a
really effective head-fake which will seriously mislead some
readers.)

I hope this helps make my remarks of this morning clearer.

-CMSMcQ
Received on Thursday, 14 September 2006 22:15:40 UTC