- From: Alessandro Vernet <avernet@orbeon.com>
- Date: Thu, 13 Jul 2006 11:21:01 -0700
- To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
- Message-ID: <4828ceec0607131121r26ead2f4o134f8beaf68c04b1@mail.gmail.com>
First I have to say that I like what I see in Alex's proposal. In particular, I particularly like the fact that: a) We use a "generic syntax" (p:step) rather than a "directed syntax" (p:xslt). b) We don't have defaults, everything is named, and connections between steps are always explicit. c) The pipeline author assigns labels to step outputs and references outputs using that label. This instead of labeling step and using a special syntax with a dot or another separator to reference the output of a step (as in: my-transformation.result). Since Alex called for more real-life examples during our last call, to get as close as possible to real-life, I will consider an example based on a pipeline that I wrote recently for a client of ours, which happens to be a well-known supplier of network equipment ;). The pipeline takes as input the name of a large file, uses a custom component that parses the file, and from its content generates a sequence of small XML documents. Each document is validated. If valid, it is imported in an XML database. Otherwise an error document is created. The pipeline returns a document with all the validation errors. This is not a completely different proposal I am making here, but rather a variation on Alex's proposal. We have 4 main constructs: the definition of the pipeline with its inputs and outputs, the step to call a component, the p:for-each to iterate over a sequence of documents, and the p:choose/p:when/p:otherwise for conditionals. The full example is attached as import-use-case-v1.xml. Let's walk through this pipeline and look at the 4 main constructs highlighted above: 1) Pipeline inputs and outputs <p:pipeline xmlns:p="..."> <p:input name="dump-filename" label="file"/> <p:output name="errors" from="aggregated-errors"/> The pipeline has 1 input and 1 output. The names seen from the outside are 'dump-filename' and 'errors'. A label 'file' is assigned to the input. This label is then used to reference that input inside the pipeline. The output comes from 'aggregated-errors' which is a label defined in the pipeline. 2) Step <p:step kind="vendor:parse-dump"> <p:with-input name="filename" from="file"/> <p:with-output name="documents" label="documents-to-import"/> </p:step> We use <p:with-input>/<p:with-output> in a step, but <p:input>/<p:output> to define inputs/outputs on a pipeline. Different names reflect different semantics. This is not unlike the <p:param>/<p:with-param> of XSLT. The 'name' attribute is always used for names of inputs/outputs defined by components. To indicate where the data comes from, we reference the label 'file' with from="file". Alternatively to get the data from a URI we can use href="filename.xml". label="documents-to-import" assigns a label to the output. This label is referenced later in the pipeline. 3) For-each <p:for-each> <p:for-each-input from="documents-to-import" label="source-document"/> <p:for-each-output label="sequence-of-errors" from="error"/> <p:for-each-input from="documents-to-import">: we iterate over the sequence of documents 'documents-to-import'. Alternatively we could have an 'href' instead of the 'from'. On this element we can also have an optional 'select' attribute. If present, it has the semantic described by Alex. <p:for-each-input label="source-document">: the label is visible inside the <p:for-each> and is used to reference the document for the current iteration. <p:for-each-output label="sequence-of-errors">: the label for the output of the <p:for-each> that will referenced outside of the <p:for-each>. <p:for-each-output from="error">: a reference to a label defined inside the <p:for-each> that corresponds to the output of one iteration. Using <p:for-each-input>/<p:for-each-output> makes it clear that those are different than the inputs/outputs we have for steps. We also avoid having too many attributes on the <p:for-each> element itself. 4) Choose/when/otherwise <p:choose from="is-valid"> <p:when test="/validity != 'true'">...</p:when> Other p:when Optional p:otherwise <p:choose> Evaluates the XPath expression on the sequence of documents labeled 'is-valid'. Here again you could have href="..." instead of from="...". The first <p:when> that returns true() is executed. If none returns true() and there is a <p:otherwise>, then the <p:otherwise> is executed. I am still on the fence for a couple of issues: a) Should we have a 'label' on the pipeline <p:input>? Or can we just have a 'name' attribute and later reference that name in the pipeline? b) Should a particular iteration of a step be allowed not to return anything? This is what happens in this example with the label 'error' declared inside a <p:when>. This makes sense in this example, but I can see reasons why we would want not to allow this. Alex -- Blog (XML, Web apps, Open Source): http://www.orbeon.com/blog/
Attachments
- text/xml attachment: import-use-case-v1.xml
Received on Thursday, 13 July 2006 18:21:08 UTC