- From: Richard Tobin <richard@inf.ed.ac.uk>
- Date: Thu, 16 Feb 2006 17:47:12 +0000 (GMT)
- To: Norman Walsh <Norman.Walsh@Sun.COM>, public-xml-processing-model-wg@w3.org
Here is my "diff" use case that I talked about. Suppose we have two XML documents that we want to compare. But these documents have some irrelevant features - automatically assigned id attributes say - that we don't want to count as differences. So we run each file through an XSLT stylesheet to strip out those features before running diff. Now with a graphical interface it would be easy to draw the pipeline we want. There would be two lines coming in at the top. Each would go to a box (a step) consisting of an XSLT transform with a parameter specifying the stylesheet. Each XSLT box would have a line coming out, and the two lines would go down to a diff box, which would have one line coming out at the bottom. Writing this pipeline as a unix shell script is straightforward but ugly, because we have to use temporary files as the shell doesn't let us write a command with two inputs from other programs: #!/bin/sh lxt -s strip-ids.xsl <$1 >/tmp/t1 lxt -s strip-ids.xsl <$2 >/tmp/t2 lxdiff /tmp/t1 /tmp/t2 (lxt is my XSLT processor, lxdiff is my diff program). I assumed that the inputs were specified by filenames, again because the shell doesn't have a way to let me hook up two general inputs, but I could have used numbered file descriptors instead. This version uses whatever is connected to file descriptors 5 and 6 of the script: #!/bin/sh lxt -s strip-ids.xsl <&5 >/tmp/t1 lxt -s strip-ids.xsl <&6 >/tmp/t2 lxdiff /tmp/t1 /tmp/t2 In fact bash does provide a syntax for hooking up multiple arguments to pipes, so we could avoid the temporary files: #!/bin/sh lxdiff <(lxt -s strip-ids.xsl <&5) <(lxt -s strip-ids.xsl <&6) How could we do this in a pipeline language? The obvious solution is to name the inputs and outputs, with some simplifying convention for the usual case where there is only one input and output. But these names are entirely local to the pipeline: they don't have to be globally unique like the temporary files in the shell script example, which will go wrong if two instances are run at once. If we compare it with the graphical representation, we effectively have to label the lines. A possible syntax would be: <pipeline inputs="i1 i2"> <step type="xslt"> <input name="i1"/> <param name="stylesheet" value="strip-ids.xsl"/> <output name="o1"/> </step> <step type="xslt"> <input name="i2"/> <param name="stylesheet" value="strip-ids.xsl"/> <output name="o2"/> </step> <step type="diff"> <input name="o1"/> <input name="o2"/> </step> </pipeline> And the simplifying convention would be that if a <step> has no <input> child then its input is the first output of the lexically preceding step, and that inputs and outputs need not be named unless the names are needed (the pipeline and the diff step use this simplification for their output). -- Richard
Received on Thursday, 16 February 2006 17:47:24 UTC