Re: Comments on XPath Streamable Subset from Michael Kay on 2010-10-06 (public-xmlsec@w3.org from October 2010)

From: Michael Kay <mike@saxonica.com>
Date: Wed, 06 Oct 2010 08:58:41 +0100
To: Pratik Datta <pratik.datta@oracle.com>
CC: public-xmlsec@w3.org, w3c-xsl-query@w3.org
Message-ID: <4CAC2C31.2050808@saxonica.com>

> It would be definitely nice to have a common subset across many specifications. We did consider the XPath subset in XML schema but it was too limited, it didn't cover all the use cases that we had. For example it doesn't allow Predicates, which are very commonly used to identify portions of the document e.g. /book/chapter[3] . And the XSLT matches, as you mentioned , are not really streamable because they allow arbitrary Predicates that can use backward axes.
>
> What we really need is an XPath profile that covers all the XPath expressions that one would normally learn in a 30 minute XPath tutorial, with the constraint that this profile should be streamable. There is nothing like that currently, so we had to invent one.  But we could consider making this more generic and reusable by other specifications. One idea we had briefly considered was having multiple concentric subsets, but that was really getting more confusing. Also at one point we also investigated making a common profile shared between XML Signature and WS Transfer, that didn't work out either.
Thanks for the explanation. I certainly recognize that the XML Schema 
subset was a very conservative one. I do appreciate the difficulties you 
found yourself in, and your approach to solving them was, I'm sure, the 
shortest path to a solution. I hope with a bit of effort we can find a 
longer path that may be better from a wider W3C perspective.
>
> Regarding your specific questions about the profile.  The inclusion of ".." was a oversight, we intended to remove all "backward" axes like parent, ancestor, preceding and preceding-sibling.  But keep all the forward axes - child, descendant, forward and forward-sibling. This is because streaming goes one pass in forward direction only.
>
In the XSLT work we've allowed access to ancestors and their attributes, 
on the basis that it's simple enough for a streaming processor to 
maintain a stack, and few documents are so deep that this imposes 
unreasonable memory demands. But we've disallowed access to the 
following and following-sibling axes. That's probably because we're not 
interested in evaluating just a single XPath expression during one scan 
of the source document: we're much more interested in combinations like

<xsl:for-each select=".//section">
<t><xsl:value-of select="title"/></t>
</xsl:for-each>

which only really work if all steps are downwards. (Though even here 
there are difficulties which are still taxing us, since the above does 
not necessarily output section titles in document order).

Both approaches are entirely rational, I suspect, but the variety of 
solutions to what appears at first sight to be the same problem is going 
to be very bewildering for users. Even within a single spec, the rules 
can be very hard to understand.

Michael Kay

Received on Wednesday, 6 October 2010 07:59:40 UTC