- From: <pratik.datta@oracle.com>
- Date: Wed, 09 Sep 2009 18:22:08 -0700
- To: edsimon@xmlsec.com
- CC: Frederick Hirsch <frederick.hirsch@nokia.com>, XMLSec WG Public List <public-xmlsec@w3.org>, Scott Cantor <cantor.2@osu.edu>
- Message-ID: <4AA854C0.7090009@oracle.com>
For the binary data as base64, I had suggested that we do support XPath, but this XPath selects the element that is the parent of the text node, not the text node itself. The problem is that text nodes are often very large. Streaming parsers like StAX often split up text nodes into many adjacent text nodes (StaX has isCoalescing property which controls this) . E.g. if there is a text node that is 1MB in size it might be split up into a thousand nodes of 1KB each. So if we allow an XPath to select text nodes, the XPath implementation would need to coalesce adjacent text nodes into one which is very inefficient. Many academic XPath implementation do not consider this issue, but this is a very big performance issue - we absolutely cannot assume that the whole text node can be loaded at once - it needs to be processed chunk by chunk. Note XML encryption is itself a big culprit for generating large text nodes - the <CipherValue> node is often extremely large. This text node is one of the big reasons for many of the limitations that I had put. For example the "string-value" function for element nodes uses text node, so I had disallow that. Another source of limitations is the result of the XPath has to be subtree which has to be canonicalized and digested. That means you cannot have already entered the subtree during the XPath evaluation, otherwise you would have already read part of the subtree, and you are not allowed to reread it, when you are canonicalizing it. E.g. suppose you have an expression like this /body/section[para/line] This XPath can be easily resolved by these Streaming XPath parsers that you mentioned, however once you have resolved it and found the section element, which has a para/line child, you have to go back to the section element and canonicalize it all, and going backwards is not allowed in streaming parser. This is why I have limited the predicate expression to only include attributes. Let me try to figure out the way to put in the id function. id() either takes in a nodeset or a string and maybe we can only support the string. I believe with all the feedback that we get, we can arrive precisely defined XPath subset, which cuts out all the features affecting streamability which still retaining all useful features. Pratik On 9/9/2009 4:36 PM, Ed Simon wrote: > Pratik's document discussing limiting XPath productions allows only > element nodes to be in the output node set. As I mentioned in this > week's meeting: "It seems to me that a transform that results in a > single text node should be supported. For example, an app stores binary > data as base64 in an XML element and wants to hash (on signing and > validation) the original raw binary. On validation, use XPath to select > the text, then feed that to the base64-decoding before hashing.". I > would suggest adding the ability to return a text node. > > More comments... > > Why isn't the id() function allowed? Could you please explain why that > is not streamable? > > I've also been looking on the Web for research about efficient XPath > streaming and have found a number of works including some describing > very efficient streaming XPath algorithms that do not require many of > the limitations proposed in Pratik's document. Here are some links: > > http://www.pms.informatik.uni-muenchen.de/publikationen/PMS-FB/PMS-FB-2001-16/slides_dagstuhl_2002.pdf > > http://cs.nyu.edu/~deepak/publications/icde.pdf > > http://xml.coverpages.org/BartonPLANX2002.pdf > > To me, the above articles suggest that many of the XPath limitations > proposed in Pratik's document are unnecessary wrt streaming and > efficiency. Comments? > > Ed > > > On Tue, 2009-09-08 at 12:07 -0400, Frederick Hirsch wrote: > >> I uploaded a version of the Streamable XPath Subset document Pratik >> sent, so that colors are preserved on the web >> >> It is available at >> http://www.w3.org/2008/xmlsec/Drafts/proposals/Streamable-XPath-subset.html >> >> I created a "proposals" directory in CVS for this sort of thing. >> >> Scott, perhaps you can include this link in the minutes so readers >> know what we were discussing. >> >> regards, Frederick >> >> Frederick Hirsch >> Nokia >> >> >> >> >> >> > > >
Received on Thursday, 10 September 2009 01:24:23 UTC