Re: Component interfaces from Erik Bruchez on 2006-01-16 (public-xml-processing-model-wg@w3.org from January 2006)

From: Erik Bruchez <ebruchez@orbeon.com>
Date: Mon, 16 Jan 2006 12:07:41 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <43CB7E7D.5070100@orbeon.com>
Rui Lopes wrote:

 > I've thought a bit more about this issue. I agree with you,
 > regarding the XSLT processor. However, requiring infosets as inputs
 > is a problem: if you have an XQuery processor, your approach would
 > require queries to be written in XQueryX [1]; if you have a Relax NG
 > schema, compact syntax would not be allowed; a hypothetic SQL
 > processor would require queries to be wrapped into an XML envelope.
 >
 > While all these issues can be handled with infosets, I'm sure that
 > we'll get a lot of pushback from the community. I believe it's
 > another issue we'll have to take into account when defining XProc's
 > data model.

The questions of XQuery and Relax NG compact are interesting. A few
points:

1. If you want a pipeline step to *generate* an XQuery or RNG Compact
    document, then you have two posssibilities to consider:

    a. You allow your processing model to natively produce non-XML
       documents. I can only reiterate my fears here, as I think we
       should concentrate on processing XML, not general-purpose text
       and binary documents (the name of the processing group is "XML
       Processing Model", not "Document Processing Model" ;-). We
       should have the courage to out-of-scope use cases that deviate
       too much from the original goal.

    b. You find a way of embedding your non-XML documents within
       XML. More on this right below.

2. You can define standard ways of embedding text and binary documents
    within XML. This is what our XPL implementation does [2]. However,
    we do not actually specify how this is done in the XPL spec. [3]
    Note this comment:

    "Limiting inputs and outputs to XML Infosets makes the language
     simpler, while still not prohibiting passing non-XML Infoset data
     by encapsulating it within an XML Infoset, be it a simple root
     element containing character data."

3. In the particular case of XQuery, a standard way of embedding it
    within XML should be defined (not by this WG, however). IMO XQueryX
    is utterly useless for practical purposes, and we discussed in the
    past [1] the need for decent, standardized XQuery embedding, but I
    don't think that has been done yet. However in OPS we used our own
    "natural" XQuery embedding syntax. For example:

       <html>
           <body>
               <table>
               {
                 for $d in //td[contains(a/small/text(), "New York, NY")]
                 return for $row in $d/parent::tr/parent::table/tr
                 where contains($d/a/small/text()[1], "New York")
                 return <tr><td>{data($row/td[1])}</td>
                <td>{data($row/td[2])}</td>
                <td>{$row/td[3]//img}</td></tr>
               }
               </table>
           </body>
       </html>

    By using such embedding, you can consume and produce XQuery in a
    pure XML infoset pipeline.

4. If you just want to provide an *external* XQuery or Relax NG
    compact document as input to a component, then you could for
    example do this with passing a URL to the component:

    <p:processor name="xpl:xquery">
      <p:input name="xquery" href="my-xquery.xquery"/>
      <p:input name="data" infosetref="my-document.xml"/>
      <p:output name="data" infoset="my-result"/>
    </p:processor>

    This approach limits passing non-XML documents as *inputs* to
    components. As far as XPL is concerned, we would be bending the
    rule of "everything is an XML infoset", which is why technically
    the distinction made in the example above between infoset and
    non-infoset inputs actually does not exist in XPL.

5. Another comment regarding Relax NG compact: as mentioned above in
    #1, producing Relax NG compact for further use in the pipeline seems
    a out of scope to us. But this does not preclude out-of-band
    validation as a built-in feature of the pipeline language, as is
    the case with XPL. You can validate with a single attribute inputs
    and inputs of components and pipelines, for example:

    <p:input name="data" infosetref="doc.xml" schema-href="my-schema.rng"/>

    In this particular case, there would not be any particular problem
    using the RNG compact syntax, as it is up to the pipeline engine to
    read and process the schema. In other words, schemas are here "out
    of band", i.e. out of the flow of XML infosets usually exchanged by
    components.

-Erik

[1] http://www.stylusstudio.com/xquerytalk/200504/000541.html
[2] http://www.orbeon.com/ops/doc/reference-formats
[3] http://www.w3.org/Submission/xpl/
Received on Monday, 16 January 2006 11:07:51 UTC