- From: <tgindin@us.ibm.com>
- Date: Fri, 7 Jul 2000 19:03:17 -0400
- To: "Joseph M. Reagle Jr." <reagle@w3.org>
- cc: "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John Boyer" <jboyer@PureEdge.com>
I think we have a failure to communicate here. I am making two claims. First, the primary protection against digest collision attacks is the search time (and, for birthday attacks, storage) required to find digest collisions, not any limit on the number of documents with a given digest, so the third sentence of the proposed text is true but irrelevant. Second, even granting that it were relevant, the argument against normalizing the character set is also IMO wrong. Let us suppose that the intended forgery is to insert the word "not" between "will" and "be" in a specific sentence. Furthermore, let us suppose that a characteristic normalization transform maps the Latin-1 character for 1/2 (U00BD), the ASCII string 1/2, and the composed sequence 1 U2044 2 all to the same value on the grounds that they all represent the fraction one-half, and let us suppose that there are exactly 100 occurrences of the Latin-1 character in the document, but none of the others. If the transform is applied before digesting, substituting one of the other two forms for 1/2 for the original has no effect on the digest, because the transform maps all three to the same character sequence, so the forged document has only one possible digest. If it isn't, each such substitution yields a different digest and the total number of digests available for the same document appearance is 3**100, which is more than 1/3 of the total number of possible digest values. Search time should still protect us, but the chances of finding a valid forgery are now restricted ONLY by search time. In short, normalizing prior to digesting AVOIDS allowing inconsequential changes to change the digest. If I have misunderstood the point of the section cited, I'm sure someone will correct me. Tom Gindin "Joseph M. Reagle Jr." <reagle@w3.org> on 07/07/2000 05:58:35 PM To: Tom Gindin/Watson/IBM@IBMUS cc: "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John Boyer" <jboyer@PureEdge.com> Subject: Re: Followup on I18N Last Call comments and disposition At 10:52 2000-06-29 -0400, tgindin@us.ibm.com wrote: >Well, it probably isn't even correct to call this a "Birthday Attack," I'm >hoping someone else jumps in and tweaks the text, but I think the gist of >what you are after is there. > >[Tom Gindin] The wording of section 8.1.3 is somewhat unfortunate already. >While it is true that transforms appear to increase the number of documents >which map to the same digest, that number is already literally >astronomical. For SHA-1, for example, the number of documents of length N >octets in UTF-8 which map to a given digest is 256**(N-20) or >2**(8*(N-20)). Larger hash algorithms may increase the number 20 somewhat, >but a 200 octet message restricted to printable ASCII would still exceed >2**1000. Not normalizing before digesting is what allows inconsequential >changes to affect the digest. I've tweaked the text slightly in the forthcoming draft, if anyone want to suggest alternative text in future versions, please propose it: 8.1.3 Transforms Can Aid Collision Attacks In addition to the semantic concerns of transforms removing or including data from a source document prior to signing, there is potential for syntactical collision attacks. For instance, consider a signature which includes a transform that changes the character normalization of the source document to Normalized Form C [NFC]. This transform increases the number of documents that when transformed and digested yield the same hash value. Consequently, an attacker could include a subsantive syntactical and semantic change to the document by varying other inconsequential syntactical values that are normalized prior to digesting such that the tampered signature document is considered valid. Consequently, while we RECOMMEND all documents operated upon and generated by signature applications be in [NFC] (otherwise intermediate processors might unintentionally break the signature) encoding normalizations SHOULD NOT be done as part of a signature transform.
Received on Friday, 7 July 2000 19:03:34 UTC