RE: [XSLT2.0] PSVI, XPath, and optimization from Niko Matsakis on 2003-11-13 (public-qt-comments@w3.org from November 2003)

From: Niko Matsakis <niko@datapower.com>
Date: Thu, 13 Nov 2003 12:16:19 -0500 (EST)
To: "Kay, Michael" <Michael.Kay@softwareag.com>
Cc: "public-qt-comments@w3.org" <public-qt-comments@w3.org>
Message-ID: <Pine.LNX.4.44.0311131131360.1343-100000@diomedes.datapower.com>

> Many people are hoping that in the example you cite, it will be possible for
> processors to predict the types statically, from knowledge of the schema.
> It's an open question, as far as I'm concerned, how frequently this will be
> possible, given the dynamic nature of template rules; I'm sure that
> implementors will come up with ideas that we haven't thought of yet.

I expect the answer will be that this is doable if the schema is simple enough
and the stylesheet is declarative enough, and I am sure that optimizing engines
will be obliged to try, but I think the gain will be slim to none in the 
kind of complicated cases where speed is really necessary.

In any case, knowing what schema the document conforms to doesn't help us 
to know when the user is using *no schema*, which I expect to be a very
common case.

Also, if we are banking on people declaring their schemas, I think there
should be a very straightforward mechanism to assert that the input tree
complies to a given schema, which doesn't seem to exist right now, but I
confess to not having read the working draft in its entirety, so I may have
missed it.

> Even if it turns out that dynamic despatch of polymorphic operators such as
> "+" is needed quite often, working with a type-annotated input tree might
> still give significant performance improvements by reducing the need for
> dynamic conversion of string values to typed values. Deciding at run-time
> whether to do date addition or integer addition is a fairly trivial overhead
> compared with the cost of converting strings to numbers or dates. 

This is a valid point; however, it doesn't take into account the overhead
of dynamic storage and the like.  For a Java interpreter, you're already paying
that cost anyhow.  For a product like ours, which compiles XSLT down to
assembly level instructions, it's a tremendous advantage to be able to store
a variable or parameter simply as an int rather than as an object that encodes
its type and value.  

In addition, you're simply pushing the time spent converting from a string
into the schema engine and out of the xslt processor.  This could be a win if
we are going to be processing every integer in the document more than once, but
otherwise you'll end up doing MORE string conversions unless you do them 
lazilly, in which case we're right back where we started only now we have
additional dynamic overhead.  I suppose that the schema engine might be doing
these conversions anyway just to check that the type is valid, but on the
whole I do not think overall speed will benefit from schema annotated trees.

> It's always true, of course, that the implementor has complete control 
> over how the input tree is built, and that includes the ability to
> build the tree without type annotations, or with a bit set that causes the
> XSLT processor not to see the type annotations.

This is not helpful to the optimizer, however, because we have no idea if
the user is going to build that input tree without annotations or not until
presented with the input document, and the next one that comes along may
have annotations.

This is the big difference between this type of information and the ambiguities
introduced by parameters in 1.0, for example: parameters were statically
analyable given the stylesheet, and in most cases it was possible to assign 
them a type.  

I would argue for doing everything you can do make ambiguity statically
resolvable.

> Thanks for a thoughtful comment.

And thank you for the quick and thoughtful reply.


Niko Matsakis

Received on Thursday, 13 November 2003 12:16:07 UTC