RE: [XSLT2.0] PSVI, XPath, and optimization

Thanks for the comment. We will register this as a last call comment and you
will get a full response from the WG in due course; but allow me to give my
immediate reactions.

> 
> I have a question about the interaction between the PSVI and 
> XPath 2.0. If I understand things correctly, a schema-aware 
> XSLT processor is supposed to respect the PSVI annotations on 
> the original XML tree.  Leaving aside questions as to how 
> those annotations are to be transmitted, I think this will 
> prevent a serious performance hurdle to XSLT transformers, 
> especially highly optimizing transformers.
> 
> Let's use an example stylesheet with only one template:
> 
> 	<xsl:template match="top">
> 		<xsl:value-of select="a+b"/>
> 	</xsl:template>
> 
> 
> Now, assume that it is fed this input doc.  I'm using 
> xsi:type attributes to indicate the PSVI typing information, 
> though of course it could have come from a schema or some 
> other source:
> 
> 	<top>
> 		<a xsi:type="xs:integer">22</a>
> 		<b xsi:type="xs:integer">44</b>
> 	</top>
> 
> I would assume the output of this execution is '66'.  On the 
> other hand, if we fed it this input document:
> 
> 
> 	<top>
> 		<!-- Note: I can't remember the syntax for schema dates
> 		and durations off the top of my head, so 
> forgive my informal
> 		notation here -->
> 		<a xsi:type="xs:gDate">Mar 20, 2003</a>
> 		<b xsi:type="xs:duration">1 month</b>
> 	</top>
> 
> Am I correct that we would expect the output to be "Apr 20, 
> 2003" (or something similar, I forget the details of 
> duration+date addition)?

Yes, your assumptions about the behaviour are correct.
> 
> If so, that's a real drag.  It used to be possible to 
> determine statically the types of almost every expression in 
> a stylesheet.  The only exception were parameters, where it 
> was possible to pass parameters of two different types to a 
> single template.

Parameters were indeed the main exception in XSLT 1.0, though that's a very
big exception. If we can educate users to declare the types of their
parameters in 2.0, then we will have made a big step forward.

Many people are hoping that in the example you cite, it will be possible for
processors to predict the types statically, from knowledge of the schema.
It's an open question, as far as I'm concerned, how frequently this will be
possible, given the dynamic nature of template rules; I'm sure that
implementors will come up with ideas that we haven't thought of yet.

Even if it turns out that dynamic despatch of polymorphic operators such as
"+" is needed quite often, working with a type-annotated input tree might
still give significant performance improvements by reducing the need for
dynamic conversion of string values to typed values. Deciding at run-time
whether to do date addition or integer addition is a fairly trivial overhead
compared with the cost of converting strings to numbers or dates. 
> 
> For a highly optimizing XSLT engine, that allowed you to 
> avoid the overhead of dynamic typing in places where it 
> wasn't needed.  In my experience, this can be a very big deal.
> 
> I understand that the committee is committed to schema typing 
> in XSLT 2.0. I would suggest, however, that it not be the 
> default behavior, but that the user specifically request that 
> PSVI annotations be respected.  At a minimum, you could have 
> an attribute that specifically stated that PSVI annotations 
> be disregarded, which would allow the engine to optimize more 
> aggressively.

We have looked at adding such an attribute, and we would have added it
except that we ran into problems finding an acceptable syntax and semantics
for it. I'm sure that we'll take another look at this in the light of your
comment. It's always true, of course, that the implementor has complete
control over how the input tree is built, and that includes the ability to
build the tree without type annotations, or with a bit set that causes the
XSLT processor not to see the type annotations.

> 
> I know that the working group has gone to great pains to 
> ensure that all expressions behave naturally when no typing 
> information is available, but I'm not sure that you've 
> considered the ramifications to performance of making schema 
> annotations enabled by default.
> 
> 
Thanks for a thoughtful comment.

Michael Kay

Received on Thursday, 13 November 2003 11:23:28 UTC