- From: Meiko Jensen <Meiko.Jensen@ruhr-uni-bochum.de>
- Date: 9 Aug 2010 17:09:57 +0200
- To: "XMLSec WG Public List" <public-xmlsec@w3.org>
I took a closer look at the latest version of the Streamable XPath Profile, and came across a set of issues I considered important to be discussed: 1. The document by now is structured as a kind of "limitation of XPath 1.0". This means it is very hard to read (or understand) it without having the XPath 1.0 spec next to it. I'm not sure this is the ideal way to shape the spec, though being a "profile" for XPath. 2. The document relies on XPath Version 1.0, though XPath 2.0 is out. I think this is no big issue since XPath 2.0 (for some reason) does not supersede XPath 1.0, but we might run into questions regarding this from other W3C groups? 3. To dive into the contents, the specified XPath subset definitely is streamable. All streaming-critical components (such as backward axes and complex XPaths in predicates) have been excluded. 4. There are some special properties coming up with this profile. I'll try to resume briefly: 4a. Allowing the descendant axis implies that it is possible to have XPaths resulting in more than one context node at each step. For instance, evaluating "//A//B" in the XML document "<A><B><A><A><B/></A></A></B></A>" has up to three different context nodes during its evaluation, and ends up with two nodes being part of the result, one of them being result of three different evaluation paths. We've been talking about this before. It is valid to have such selections, it just makes evaluation become more complex (potential for Denial of Service here!!), since you have to maintain more than one context node per streaming event. 4b. Similar to 4a, having "|" allowed implies possibility of more than one context node. 4c. the profile explicitly excludes the use of the "text()" and "node()" operators in order not to run into complex text node collection issues, which are very difficult in streaming evaluators. However, XPath 1.0 has some loopholes that still run into selecting texts. For instance, the XPath 1.0 function of "string()" without explicit argument implies to convert the context node to string, which means concatenating all its text node descendants. Hence, we have implicit text() selection here. In the same line, I'm not yet convinced whether other conversion functions of XPath 1.0 (any "node-set to X conversion") might also result in implicit access to data that is not available in the streaming model. This would require an in-depth compatibility analysis between the Streamable XPath profile and XPath 1.0. 4d. Excluding the "node()" operator but allowing the "." and "//" operators is a contradicition per se, since XPath 1.0 says "." is an abbreviation of "self::node()" and "//" means "/descendant-or-self::node()/". However, as long as we explicitly explain the valid and invalid use cases of "node()" we should be fine. 4e. The "following" and "following-sibling" axes in fact are streamable, yet I don't see a good reason to keep them in this profile since they would make evaluation get way more complex again. 4f. The given grammar explicitly disallows predicates in non-final XPath steps. This contradicts with example no. 5 that says "/book/chapter[2]/title[1]" would be allowed. In fact, "chapter[2]" is an abbreviation of "chapter[position()=2]", hence being a regular predicate. In this case I'd strongly recommend to allow position indicators to be used in non-final XPath steps. This is one of the major requirements for fending signature wrapping attacks: the XPath expression must pin down the signed element's location precisely, hence would definitely need position indicators at every step. I'd even go so far as to recommend having position indicators being REQUIRED for every step. 4g. The specified subset of XPath in fact is very close to the FastXPath subset we've been proposing in 2009 (http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=5175871&abstractAccess=no&userType=inst). One of the interesting differences, though, is that FastXPath also prohibits the use of the "or" operator within predicates. The rationale behind that was to stick to the "policy of reduction": during evaluation, every test in every step of the XPath either keeps a (single) context node or switches to the empty result set (and immediately cancels further evaluation). The "or" operator violates this policy: if the first condition evaluates to "false" (=> result-set gets empty), it is possible that the second condition might become "true", putting the context node back in to the result-set. Again, this causes additional parsing complexity (Denial of Service via huge amount of "or" conditions). However, this was a design decision for FastXPath that I'd recommend to consider for the Streamable XPath Profile, but I do not insist in its adoption. 5. Some minor issues: 5a. The document needs some typo/spell checking. 5b. in "1. Introduction" list item "2.b." says that "/book/chapter[3]" is not expressable in XMLDSig's XPath Filter. "ancestor-or-self::chapter[ . = /book/chapter[3] ]" should do, yet I agree with the intention behind this sentence that it is way too complex. 5c. in "1. Introduction" list item "3.b." says that "ID based references are streamable". If you consider the IDed element to occur before the XML signature within the XML document, this does not hold. I'd soften this sentence to something like "ID references usually are easier to process than XPath Filter 2 Transforms in a streaming-based parsing environment". 5e. in "4. Definition of the Profile" the notes to "[1] LocationPath ::=" say that relative location paths are not allowed, claiming these to start from the "<Signature>" element. I'd suggest to remove this comment for now, since it brings up false assumptions just to mark them invalid immediately. However, since I still consider such a referencing scheme (with relative XPath starting at a context node different from the document root) useful, I'd suggest discussing this issue separately, then decide how to deal with it in the XPath profile. 5f. in "6. Algorithm" list item "for parsing: 2." says that "//" indicates the descendant axis. This is misleading, since you would expect "//A/B" to match in the document "<A><B/></A>". However, if "//" means "descendant", this results in "/descendant::node()/child::A/child::B", and since there is no node between <A> and the document root, this XPath would evaluate to {}. This should close my ACTION-614. best regards Meiko -- Dipl.-Inf. Meiko Jensen Chair for Network and Data Security Horst Görtz Institute for IT-Security Ruhr University Bochum, Germany _____________________________ Universitätsstr. 150, Geb. IC 4/150 D-44780 Bochum, Germany Phone: +49 (0) 234 / 32-26796 Telefax: +49 (0) 234 / 32-14347 http:// www.nds.rub.de
Received on Monday, 9 August 2010 15:10:30 UTC