W3C home > Mailing lists > Public > public-xmlsec@w3.org > October 2009

Re: Streaming XPath - additional material from our research background

From: Meiko Jensen <Meiko.Jensen@rub.de>
Date: 30 Oct 2009 23:12:12 +0100
Message-ID: <4AEB64BC.2020701@rub.de>
To: pratik.datta@oracle.com
Cc: public-xmlsec@w3.org
Hi Pratik,

agreed, the examples you gave pose a difficult problem. We actually came
across that one several times in our streaming-based XML Signature
verification implementation (cf.
http://www.informatik.uni-kiel.de/fileadmin/arbeitsgruppen/comsys/files/public/swws-full.pdf
). The problem is that of "might matches" of an XPath in the
streaming-based evaluation. Taking e.g. the XPath //A[//B] as input in
streaming-based evaluation would render every A element before the first
B element (if any) as a potential match of the XPath, depending on the
outcome of the further XPath evaluation. Once a B element is found,
these "might matches" turn into "true matches" that do not depend on
further evaluations. The question now is how to cope with these might
matches. Easiest approach obviously is excluding them from the allowed
XPath subset (i.e. no backward axes like parent, ancestor*,
previous-sibling., no "last()" function etc.). We also recommend this
approach (see FastXPath, which in fact is even more restrictive).

Alternatively, if backward axes for some reason are unavoidable, one
could consider performing the intended tasks (for which the
XPath-referenced elements actually are taken as input) also for the
might matches, but delaying the final processing up to the point where
they turn out to become true matches (if ever). For the specific task of
XML Signature validation, this implies hashing (+canonicalizing etc).
every might match just like a true match. Of course, this opens up a
potential Denial of Service vulnerability in some circumstances (like
the ones you mentioned)---a clear drawback that seems unavoidable in the
presence of such backward axes and operators. Nevertheless it might be
worth a second thought, as it is the only way to get close to a
"full-XPath-capable" streaming-based XML-Signature verification
implementation.

But in general I agree that defining a reduced version of XPath that is
fully streaming-compliant is a way better approach (I mean, come on,
who's using backward axes in XML-Signature-XPaths in reality after all?)

By the way, the issue of might matches already occurrs in the ID-based
referencing scheme without XPath: when stream-parsing an XML document
that contains an XML Signature (enclosed signature scheme, e.g. signing
a SOAP message's envelope or previous headers), you have to hash (+c14n)
every element that has an "ID" attribute from the beginning. Otherwise,
when the parser ends up at the contained XML Signature data block, it
notices that the ID-referenced element (the soap:Envelope) has already
left the parser. Thus, here you have an implicit might match for every
element that has such an ID attribute.

However, my intention was not to say that you should perform full XPath
in a streaming way using the Barton technique. I agree that this is
highly critical due to the issues discussed above, and most assumably
would not turn out to be useful in the end.

I just wanted to point your attention to the other major implication of
a streaming-enabled XPath in the context of XML Signature Wrapping
attacks. We found out that one -- very successful -- approach in fending
this threat to XML Signature's reliable application consists in fixing a
signed element's position by using a very strict, document-tree-oriented
position fixation in the reference. This way, the signed contents can no
longer be moved arbitrarily within the XML document without invalidating
the signature. Hence, the Signature Wrapping threat can be fended.

XPath is capable of providing such a fixation, but only if the XPath
expression is crafted very carefully. For instance, an XPath expression
of "//soap:Body" does not fix the soap:Body's position within the
document, so it still can be moved in order to trick the application
logic behind the XML Signature verification. This would have been
achieved better using an XPath like "/soap:Envelope/soap:Body", which
omits the (vulnerable) descendant-or-self axis in favor of the (strictly
fixed) child axis. See
http://www.nds.rub.de/media/nds/downloads/mjensen/ICWS09.pdf , Section
on "FastXPath", for our full evaluation on this.

Thus, as I understood there is an interest in defining such a reduced
subset of XPath for use in upcoming XML Signature specifications, I
thought it to be useful to also take this viewpoint on the topic into
consideration.

However, I'm always willing to also discuss the Barton approach with you ;)

best regards from Bochum

Meiko



pratik.datta@oracle.com schrieb:
> Meiko,
> There are definitely XPaths that are not streamable.  Here are two
> such examples, and they are not covered in the Barton paper.
>
> -------------------------------------
> The subset in the Barton paper does not include any functions. Some of
> the the functions are not streamable especially the pos() function.
>
> For example suppose you have a document like this with a 1000 <b>
> elements.
>
> <a>
>  <b/>
>  <b/>
>  <b/>
>  ...
>  <b/>
> </a>
>
> And you want to evaluate the expression that wants to find out <b>
> that is at the 100th  position counting backwards.
> i.e.
> /a/b[position() = last() - 100]
>
> How can you evaluate this in streaming parser where you can go only
> one pass? You do not know how many <b> elements there are, so you must
> reach the end, and then go back.  Even if Xpath has some kind of
> lookahead cache, that cache cannot hold 900 <b> elements. (Or if it
> can, then you can change the problem to have zillion <b> elements)
> -----------------------------
>
> Another example  which is more specific to XML Signature.
> We want the whole XML signature operation to be in one pass. I.e.
> Xpath selection, canonicalization, digesting  together have to be in
> one pass.
>
> So Suppose you have the XPath  /a/b[c]
>
> so basically you want to sign the entire /a/b/  element, but only
> those /a/b elements which have a c child.
>
> suppose your doc is like this
> <a>
> <b>
>     some very long content
>     <c/>
>  </b>
> </a>
>
>
> So you can definitely find out  /a/b[c]  in streaming,  but after you
> have found that out, you have to go back to the beginning of <b> and
> start the rest of the signing process - and that will break the
> streaming of the signature as a whole.
>
> Pratik
>
>
>
>
> On 10/29/2009 4:34 AM, Meiko Jensen wrote:
>> Hi,
>>
>> as I lately noticed that the WG deals with similar problems as we do
>> within our latest research (i.e. streamable subset of XPath in the
>> context of XML Signatures), I'd like to point your attention to some of
>> our findings for consideration and discussion.
>>
>> Though Barton et al. ( http://cs.nyu.edu/~deepak/publications/icde.pdf )
>> have shown that in theory every XPath expression can be converted into
>> an equivalent XPath that does not contain any backward axes (thus
>> allowing stream-based evaluation in general), the topic of a streamable
>> subset of XPath is of crucial importance. Apart from the pure
>> performance gains by using a stream-based XML Signature validation (and
>> maybe also application), one should also be aware of the other use that
>> such a subset could have -- in terms of fending the XML Signature
>> Wrapping attack. As we have shown lately (
>> http://www.nds.rub.de/media/nds/downloads/mjensen/ICWS09.pdf ), this
>> particular attack threat can be tackled using position-aware referencing
>> schemes in XML Signatures, which obviously can be done e.g. using
>> XPath-based transformations.
>>
>> We thus defined a strong subset of XPath ourselves (called FastXPath),
>> which to our consideration provides both: it performs way better than
>> full XPath (see evaluation in the paper) and additionally was shown to
>> be way more resistant to the XML Signature Wrapping threat.
>>
>> Thus, if you are interested in determining on how our work relates to
>> the ongoing discussion on streamable XPath, please feel free to
>> contact me.
>>
>> Best regards from Bochum, Germany
>>
>> Meiko
>>
>>   
Received on Saturday, 31 October 2009 14:54:38 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:44:00 GMT