- From: C. M. Sperberg-McQueen <cmsmcq@acm.org>
- Date: Thu, 14 Sep 2006 16:15:16 -0600
- To: public-xml-processing-model-wg@w3.org
- Cc: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
[The first attempt to send this mail to the list failed because the size exceeded the limit imposed by the list configuration. So I've removed the embedded images and replaced them with URLs to the images. Those with member access to the W3C site can if they prefer read a version of this email with the images embedded in it, at http://lists.w3.org/Archives/Member/w3c-archive/2006Sep/0071.html but I discover on inspection that it doesn't embed the images at the point of reference.] During the call today, Norm and Henry expressed the hope that Alex Milowski and I could send some email trying to make clearer the distinction we were drawing between two views of pipelines which I'll here call the 'nested' and the 'flat' views. In the nested view, a pipeline is a digraph whose nodes are components. Period. All components are nodes, all nodes are components. From the point of view of the pipeline, each node is atomic. The internal structure of compound components is NOT visible in the graph which constitutes the pipeline. Instead, such a compound component contains, or is, a second graph, quite distinct from the other. Drawn in this way, the pipeline in Figure 2 of the draft spec looks as shown in the diagram http://www.w3.org/XML/2006/09/nesting-01a.png That is, the main pipeline has two components, C1 and an XSLT component. The Validate components (here mislabeled 'Process', sorry about that) are not nodes in the main pipeline, they are nodes in a different pipeline named C1. One advantage of this view is that it makes the recursive nature of our pipeline specification language easy to talk about. One disadvantage is that it provides no concept in which both the XSLT component and the two validation components in this pipeline appear in the same picture. Any discussion that needs to talk about all three of the atomic components here may spend a lot of time shifting gears and switching from one level to the other. In the flat view, a pipeline is a digraph whose nodes are components. Period. All nodes are components, all components are nodes. From the point of view of the pipeline, and indeed from every point of view, each node in the graph and each component is atomic.Step containers like choose are not atomic, and hence they are not nodes in the graph, and (thus) they are not components. That doesn't mean they have disappeared, but only that they correspond to subgraphs in the pipeline. The pipeline in Figure 2 looks like http://www.w3.org/XML/2006/09/subgraph-01.png when drawn in this style. Unlike the nested style, this represents the pipeline completely in a single graph, not a set of graphs whose interrelations are not captured by any facts about the graph(s). Note that the internal structure of compound components is visible in the main pipeline graph. The compound component C1 is also visible, but as a subgraph (surrounded here by a dotted line) and not as a node. The pipeline shown in section 4.1.3 of the draft we discussed today may provide a useful second illustration. For the benefit of readers without the draft in front of them, here is one version of that pipeline, without explicit declarations of ports on the choose: <p:pipeline xmlns:p="http://www.w3.org/2006/08/pipeline"> <p:declare-input port="document"/> <p:declare-parameter name="makeHTML" required="yes"/> <!-- for the sake of convenience, we assume these steps take no inputs and produce a single output on a port named "result" --> <p:step name="gen-fo" component="ex:generate-fo-stylesheet"/> <p:step name="gen-html" component="ex:generate-html-stylesheet"/> <p:choose name="choose-result"> <p:when test="$makeHTML = '1'"> <p:step name="makeHTML" component="p:xslt"> <p:input port="document" source="!document"/> <p:input port="stylesheet" source="gen-html!result"/> </p:step> <p:step name="writeHTML" component="p:serialize"> <p:input port="document" source="makeHTML!result"/> </p:step> </p:when> <p:otherwise> <p:step name="makeFO" component="p:xslt"> <p:input port="document" source="!document"/> <p:input port="stylesheet" source="gen-fo!result"/> </p:step> <p:step name="writePDF" component="p:fo-to-pdf"> <p:input port="document" source="makeFO!result"/> </p:step> </p:otherwise> </p:choose> </p:pipeline> Here is the nested view: the outer pipeline is a collection of four graphs: http://www.w3.org/XML/2006/09/nest.413.png Here I've drawn the inputs and outputs of each pipeline as document shapes (or ovals, for parameters) straddling the box which outlines the pipeline's graph. None of these subpipelines actually seems to have any output, though. (And the port names declared aren't visible in the version of the pipeline reproduced above.) In the flat view, each subpipeline becomes a subgraph (here, again, outlined by dotted lines). I've put document shapes on the data flow lines here, in an attempt to make it easier to associate data-flow lines in the diagram with named ports in the written form of the pipeline -- I'm not sure of the best way to draw these things, so bear with me if the representation is clunky in some ways. http://www.w3.org/XML/2006/09/flat.413.png It should be clear, on reasonably careful examination, that either of these ways of structuring the graph(s) can be built from the other, by processes that cry out for names like 'encapsulation' and 'in-lining'. I believe Alex's experience is that when actually constructing the executable pipeline, one often starts by constructing, from the XML, something like the nested view, but then in order to manage the execution one needs to build from that something more like the flat view, because it is in the flat view that you see more conveniently how many threads are in any particular flow. "The longest chain is this one, so that's the one I need to optimize." Myself, I don't believe I have a dog in this fight. All I want is for the spec, and our discussions, to be clear. And the definition of pipeline in the current spec as a digraph whose nodes are components makes me think very strongly of the flat view, not the nested view. (This is so even though I agree that there is some sense in which the nested view satisfies the definition. My observation is not that one is necessarily inaccurate -- it rather depends on how we end up defining 'component' -- but that if the nested view is the one we mean, then just saying a pipeline is a digraph of components is a really effective head-fake which will seriously mislead some readers.) I hope this helps make my remarks of this morning clearer. -CMSMcQ
Received on Thursday, 14 September 2006 22:15:40 UTC