Re: Followup on I18N Last Call comments and disposition

Tom,

I think your statements are exactly right, and I'm barking up the wrong tree
in speaking of collisions. However, I'm trying to address the point raised
by Masahiro and Martin:

   Assume that a document contains XML with element names with
   accented characters. Assume that this document is correctly
   normalized. Assume that the signature includes NFC as a transform.
   Now the following attack is possible: An intruder replaces the
   normalized document by a document with some of the element
   names unnormalized. The signature still works. However, an
   XML/DOM processor or an XPath expression may (and in practice
   will) work differently, because the unnormalized element is
   assumed to be different from the normalized one....
   and combine this with a DOM program that extracts the first
   <amount> and pays somebody that much. After the change by
   the intruder, the amount actually paid is $1000 instead of $10.
   http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0254.html

and

  **** E.g. in Section 8, at a convenient location (e.g. 8.1), add
        something like: Using character normalization (Normalization
        Form C of UTR #15) as a transform or as part of a transform
        can remove differences that are treated as relevant by most
        if not all XML processors. Character normalization should
        therefore be done at the origin of a document, and only
        checked, but never be done during signature processing.
   http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000AprJun/0314.html

Perhaps this is a problem is best addressed by the inverse of the
pre-existing rule of  "see what you sign" because:

1. I think I18N's concern is about an XML (DOM) processor operating over the
pre-canonicalized XML document after the Signature processor has declared
the signature over its canonilized form as valid. (For instance finding the
first instance of some element where character normalization may change what
is thought of the 'first element' but the Signature still validated.)
2. I don't think this concern is unfounded as we've (somewhat/sometimes)
expressed an expectation that processors won't operate over the
canonicalized form of the XML document. The C14N'ized XML is merely a
normalizing step prior to digesting. For instance, what if we had chosen to
design a canonicalization algorithm that did not output XML but a binary
format? Clearly, the XML processor is going to operate over the original XML
content and I18N's security concern is a valid one.
3. However, our expectations of C14N have changed in that we are using it
for document subsettting as will XML Query probably and the earlier
expectation was not the most secure.
4. Consequently we need to:
A. Ensure that DOM sees only what is Signed. This is our expectation with
XPath/XSLT and this should be no different. (We're getting close to "Boyer's
transform closure" issue where he wants to operate over the original XML
document though ensure that the transforms resulting in the final form
didn't introduce potential weakensses ((like character normalization).
B. State that the C14N transform is like any other transform and
canoniciization algorithms which yield binary results can be dangerous
because the result is not "seen".
C. Ensure that our own Signature Validator sees what was signed when it
validates the Signature. Consequently, I believe the Canonicalization of
3.2.2.1 needs to happen BEFORE Reference Validation of 3.2.1.1 .

Consequently, I've tweaked 3.2.1 Reference Validation

For each Reference in SignedInfo: 
/+ 1. Canonicalize the SignedInfo element based on the
CanonicalizationMethod in SignedInfo. +/

AND section 8.1.3 "See" What is Signed (Do we still need the last paragraph?)

| Note: This new recommendation is actually a combination/inverse 
| of the earlier recommendations and is still under discussion.

Just as a person or automatable mechanism should only sign what it "sees,"
persons and automated mechanisms that trust the validity of a transformed
document on the basis of a valid signature SHOULD operate over the data that
was transformed (including canonicalization) and signed, not the original
pre-transformed data. Some applications might operate over the original data
but SHOULD be extremely careful about potential weaknesses introduced
between the original and transformed data. This is a trust decision about
the character and meaning of the transforms that an application needs to
make with caution. Consider a canonicalization algorithm that normalizes
character case (lower to upper) or character composition ('e and accent' to
'accented-e'). An adversary could introduce changes that are normalized and
consequently inconsequential to signature validity but material to a DOM
processor. For instance, by changing the case of a character one might
influence the result of an XPath selection. A serious risk is introduced if
that change is normalized for signature validation but the processor
operates over the original data and returns a different result than intended.

Consequently, while we RECOMMEND all documents operated upon and generated
by signature applications be in [NFC] (otherwise intermediate processors
might unintentionally break the signature) encoding normalizations SHOULD
NOT be done as part of a signature transform.




At 19:03 2000-07-07 -0400, tgindin@us.ibm.com wrote:
 >     I think we have a failure to communicate here.  I am making two
 >claims.  First, the primary protection against digest collision attacks is
 >the search time (and, for birthday attacks, storage) required to find
 >digest collisions, not any limit on the number of documents with a given
 >digest, so the third sentence of the proposed text is true but irrelevant.
 >Second, even granting that it were relevant, the argument against
 >normalizing the character set is also IMO wrong.  Let us suppose that the
 >intended forgery is to insert the word "not" between "will" and "be" in a
 >specific sentence.  Furthermore, let us suppose that a characteristic
 >normalization transform maps the Latin-1 character for 1/2 (U00BD), the
 >ASCII string 1/2, and the composed sequence 1 U2044 2 all to the same value
 >on the grounds that they all represent the fraction one-half, and let us
 >suppose that there are exactly 100 occurrences of the Latin-1 character in
 >the document, but none of the others.  If the transform is applied before
 >digesting, substituting one of the other two forms for 1/2 for the original
 >has no effect on the digest, because the  transform maps all three to the
 >same character sequence, so the forged document has only one possible
 >digest.  If it isn't, each such substitution yields a different digest and
 >the total number of digests available for the same document appearance is
 >3**100, which is more than 1/3 of the total number of possible digest
 >values.  Search time should still protect us, but the chances of finding a
 >valid forgery are now restricted ONLY by search time.
 >     In short, normalizing prior to digesting AVOIDS allowing
 >inconsequential changes to change the digest.  If I have misunderstood the
 >point of the section cited, I'm sure someone will correct me.
 >
 >          Tom Gindin
 >
 >
 >"Joseph M. Reagle Jr." <reagle@w3.org> on 07/07/2000 05:58:35 PM
 >
 >To:   Tom Gindin/Watson/IBM@IBMUS
 >cc:   "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John
 >      Boyer" <jboyer@PureEdge.com>
 >Subject:  Re: Followup on I18N Last Call comments and disposition
 >
 >
 >
 >At 10:52 2000-06-29 -0400, tgindin@us.ibm.com wrote:
 > >Well, it probably isn't even correct to call this a  "Birthday Attack,"
 >I'm
 > >hoping someone else jumps in and tweaks the text, but I think the gist of
 > >what you are after is there.
 > >
 > >[Tom Gindin] The wording of section 8.1.3 is somewhat unfortunate
 >already.
 > >While it is true that transforms appear to increase the number of
 >documents
 > >which map to the same digest, that number is already literally
 > >astronomical.  For SHA-1, for example, the number of documents of length
 >N
 > >octets in UTF-8 which map to a given digest is 256**(N-20) or
 > >2**(8*(N-20)).  Larger hash algorithms may increase the number 20
 >somewhat,
 > >but a 200 octet message restricted to printable ASCII would still exceed
 > >2**1000.  Not normalizing before digesting is what allows inconsequential
 > >changes to affect the digest.
 >
 >I've tweaked the text slightly in the forthcoming draft, if anyone want to
 >suggest alternative text in future versions, please propose it:
 >
 >8.1.3 Transforms Can Aid Collision Attacks
 >In addition to the semantic concerns of transforms removing or including
 >data from a source document prior to signing, there is potential for
 >syntactical collision attacks. For instance, consider a signature which
 >includes a transform that changes the character normalization of the source
 >document to Normalized Form C [NFC]. This transform increases the number of
 >documents that when transformed and digested yield the same hash value.
 >Consequently, an attacker could include a subsantive syntactical and
 >semantic change to the document by varying other inconsequential
 >syntactical
 >values that are normalized prior to digesting such that the tampered
 >signature document is considered valid. Consequently, while we RECOMMEND
 >all
 >documents operated upon and generated by signature applications be in [NFC]
 >(otherwise intermediate processors might unintentionally break the
 >signature) encoding normalizations SHOULD NOT be done as part of a
 >signature
 >transform.
 >
 >

_________________________________________________________
Joseph Reagle Jr.   
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/People/Reagle/

Received on Friday, 7 July 2000 21:35:12 UTC