W3C home > Mailing lists > Public > public-xml-processing-model-wg@w3.org > July 2006

Concrete syntax with use case

From: Alessandro Vernet <avernet@orbeon.com>
Date: Thu, 13 Jul 2006 11:21:01 -0700
Message-ID: <4828ceec0607131121r26ead2f4o134f8beaf68c04b1@mail.gmail.com>
To: public-xml-processing-model-wg <public-xml-processing-model-wg@w3.org>
First I have to say that I like what I see in Alex's proposal. In
particular, I particularly like the fact that:

a) We use a "generic syntax" (p:step) rather than a "directed syntax" (p:xslt).
b) We don't have defaults, everything is named, and connections
between steps are always explicit.
c) The pipeline author assigns labels to step outputs and references
outputs using that label. This instead of labeling step and using a
special syntax with a dot or another separator to reference the output
of a step (as in: my-transformation.result).

Since Alex called for more real-life examples during our last call, to
get as close as possible to real-life, I will consider an example
based on a pipeline that I wrote recently for a client of ours, which
happens to be a well-known supplier of network equipment ;).

The pipeline takes as input the name of a large file, uses a custom
component that parses the file, and from its content generates a
sequence of small XML documents. Each document is validated. If valid,
it is imported in an XML database. Otherwise an error document is
created. The pipeline returns a document with all the validation
errors.

This is not a completely different proposal I am making here, but
rather a variation on Alex's proposal. We have 4 main constructs: the
definition of the pipeline with its inputs and outputs, the step to
call a component, the p:for-each to iterate over a sequence of
documents, and the p:choose/p:when/p:otherwise for conditionals.

The full example is attached as import-use-case-v1.xml. Let's walk
through this pipeline and look at the 4 main constructs highlighted
above:

1) Pipeline inputs and outputs

<p:pipeline xmlns:p="...">
    <p:input name="dump-filename" label="file"/>
    <p:output name="errors" from="aggregated-errors"/>

The pipeline has 1 input and 1 output. The names seen from the outside
are 'dump-filename' and 'errors'. A label 'file' is assigned to the
input. This label is then used to reference that input inside the
pipeline. The output comes from 'aggregated-errors' which is a label
defined in the pipeline.

2) Step

<p:step kind="vendor:parse-dump">
    <p:with-input name="filename" from="file"/>
    <p:with-output name="documents" label="documents-to-import"/>
</p:step>

We use <p:with-input>/<p:with-output> in a step, but
<p:input>/<p:output> to define inputs/outputs on a pipeline. Different
names reflect different semantics. This is not unlike the
<p:param>/<p:with-param> of XSLT.

The 'name' attribute is always used for names of inputs/outputs
defined by components. To indicate where the data comes from, we
reference the label 'file' with from="file". Alternatively to get the
data from a URI we can use href="filename.xml".

label="documents-to-import" assigns a label to the output. This label
is referenced later in the pipeline.

3) For-each

<p:for-each>
    <p:for-each-input from="documents-to-import"
            label="source-document"/>
    <p:for-each-output label="sequence-of-errors"
            from="error"/>

<p:for-each-input from="documents-to-import">: we iterate over the
sequence of documents 'documents-to-import'. Alternatively we could
have an 'href' instead of the 'from'. On this element we can also have
an optional 'select' attribute. If present, it has the semantic
described by Alex.

<p:for-each-input label="source-document">: the label is visible
inside the <p:for-each> and is used to reference the document for the
current iteration.

<p:for-each-output label="sequence-of-errors">: the label for the
output of the <p:for-each> that will referenced outside of the
<p:for-each>.

<p:for-each-output from="error">: a reference to a label defined
inside the <p:for-each> that corresponds to the output of one
iteration.

Using <p:for-each-input>/<p:for-each-output> makes it clear that those
are different than the inputs/outputs we have for steps. We also avoid
having too many attributes on the <p:for-each> element itself.

4) Choose/when/otherwise

<p:choose from="is-valid">
    <p:when test="/validity != 'true'">...</p:when>
    Other p:when
    Optional p:otherwise
<p:choose>

Evaluates the XPath expression on the sequence of documents labeled
'is-valid'. Here again you could have href="..." instead of
from="...". The first <p:when> that returns true() is executed. If
none returns true() and there is a <p:otherwise>, then the
<p:otherwise> is executed.


I am still on the fence for a couple of issues:

a) Should we have a 'label' on the pipeline <p:input>? Or can we just
have a 'name' attribute and later reference that name in the pipeline?

b) Should a particular iteration of a step be allowed not to return
anything? This is what happens in this example with the label 'error'
declared inside a <p:when>. This makes sense in this example, but I
can see reasons why we would want not to allow this.

Alex
-- 
Blog (XML, Web apps, Open Source):
http://www.orbeon.com/blog/


Received on Thursday, 13 July 2006 18:21:08 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:48 GMT