Re: Pipeline proposal from Jeni Tennison on 2006-04-05 (public-xml-processing-model-wg@w3.org from April 2006)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Wed, 05 Apr 2006 21:33:12 +0100
To: public-xml-processing-model-wg@w3.org
Message-ID: <44342988.4020103@jenitennison.com>
Norm Walsh wrote:
> / Jeni Tennison <jeni@jenitennison.com> was heard to say:
> | the back of our minds that to fully support XSLT and XQuery we really
> | need to pass document fragments and external parsed entities between
> | components.
> 
> I'm not sure we really need to be able to do that. It seems to me that
> the pipeline author can wrap an additional dummy element around the
> fragments as they flow through the pipeline and strip it off at the
> end. I'm happy to keep the idea of doing something more efficient in
> mind, but I don't think the requirement to pass documents around
> imposes any insurmountable obstacles.

OK, but we need to say something about what an XSLT processor or XQuery 
engine should do when the result of a transformation/query is a document 
fragment. Does it error? Does it wrap the fragment in a dummy element 
automatically?

> Knowing statically that a component expects or produces a certain kind
> of document might be useful, but I think I'd like to see that done as
> an extension in V1. Knowing (or asserting) dynamically what's expected
> or produced is just a shorthand for inserting validator or other
> components before or after the component, isn't it? I'm not opposed to
> authoring convenience, but if it gets ugly to specify, I think we can
> live without it.
> 
> I, for one, am disappointed that XSLT can only deal with W3C XML
> Schema validation and would be sorely disappointed if we found
> ourselves in the same position. I can imagine wanting to check against
> a RELAX NG grammar or a Schematron grammar as easily as a W3C XML
> Schema grammar.

I agree wholeheartedly about wanting to remain schema-language neutral.

We can either treat validation as a loosely coupled separate process or 
a tightly coupled prelude to a processing task. I generally favour the 
latter because I don't see validation as an end in itself; because it 
means that less information has to be passed around between steps (you 
never need to pass a PSVI); because it seems less likely to break over 
time; and because defining the schemas for the documents expected and 
generated by a component provides good documentation and is useful for 
tools that help you build pipelines.

But I completely accept it's not a priority for this version.

> | I'm concerned about how we define parameters. In particular, I'm
> | worried about the XSLT case where one of the parameters for the
> | component is a set of QName/value pairs (the XSLT parameters). I guess
> | that in this proposal, you'd do that with a parameter called
> | 'parameters' whose value was a formatted string such as
> | '{uri1}local1=value1; {uri2}local2=value2', or we'd say that the XSLT
> | parameters were encoded in an XML document and passed as an *input*.
> 
> I was thinking that those would *be* the parameters:
> 
>   <step name="xslt">
>     <param name="x:y" value="'xxx'"/>
>     <param name="a:b" value="'yyy'"/>
>     <param name="foo" value="$foo"/>
>   </step>
> 
> But that does require that the pipeline know about the names of the
> parameters. If you want completely dynamic parameters, I think you
> might have to stuff them all in a configuration document and tweak
> your stylesheet to read that.

I don't think that knowing the names of the parameters at design time is 
a problem, rather I was concerned about name clashes and confusion 
between *component* and *stylesheet* parameters. The XSLT component will 
need to have a parameter called, say, "initial-mode", which specifies 
the initial mode of the stylesheet. If stylesheet parameters and 
component parameters share the same name space then no stylesheet will 
be able to have a parameter called "initial-mode".

I suppose we could use QNames to avoid this: any parameter whose name is 
in the XSLT namespace is a component parameter rather than a stylesheet 
parameter. For example, something like this to invoke the template 
called 'main' within the stylesheet 'style.xsl' with no initial context 
document and the parameters $foo set to 'foo' and $bar set to '2':

   <p:step name="xslt">
     <p:input name="stylesheet" href="style.xsl" />
     <p:param name="xsl:initial-template">main</p:param>
     <p:param name="foo">foo</p:param>
     <p:param name="bar">2</p:param>
     <p:output name="result" />
   </p:step>

Or we could define the XSLT component such that one of its inputs is a 
document that specifies the parameters, and support providing documents 
inline within the pipeline specification document as well as through 
reference:

   <p:step name="xslt">
     <p:input name="stylesheet" href="style.xsl" />
     <p:input name="parameters">
       <xsl:params>
         <xsl:param name="foo" select="'foo'" />
         <xsl:param name="bar" select="2" />
       </xsl:params>
     </p:input>
     <p:param name="initial-template">main</p:param>
     <p:output name="result" />
   </p:step>

> I think we shold assert that components are side-effect free. Given
> the same inputs, they produce the same outputs and you can't tell if
> the implementation did the computation twice or cached the results.

I would very much like to see us make this assertion.

Cheers,

Jeni
-- 
Jeni Tennison
http://www.jenitennison.com
Received on Wednesday, 5 April 2006 20:33:17 UTC