Re: Followup on I18N Last Call comments and disposition from Martin J. Duerst on 2000-07-10 (w3c-ietf-xmldsig@w3.org from July to September 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Mon, 10 Jul 2000 17:58:52 +0900
To: tgindin@us.ibm.com, "Joseph M. Reagle Jr." <reagle@w3.org>
Cc: w3c-ietf-xmldsig@w3.org, "John Boyer" <jboyer@PureEdge.com>, www-international@w3.org
Message-Id: <4.2.0.58.J.20000710174812.033a3570@sh.w3.mag.keio.ac.jp>
Tom - Your considerations below are true if you assume that other
kinds of applications will not make a difference between
(in your example) the Latin-1 character for 1/2 (U00BD), the
ASCII string 1/2, and the composed sequence 1 U2044 2.

However, the basic idea of early normalization is to make things
easy for as many applications as possible, and 'to make things easy'
means to not have to consider such equivalences for every single
operation in every application. For applications that indeed
distinguish these representations, or for places where these
representations are distinguished, the attack you describe
below is not relevant. Twiddling around with the equivalences
will produce something different anyway, and so signatures
and digests shouldn't normalize these differences out.

There is of course the chance of 'applications' not making the
difference. The obvious case is the human viewer :-).
In order to cover both sides, I think our earlier proposal
of a transform that checks for normalization, but doesn't
actually do it (just fails if the input is not normalized)
might do the job. Can you please check this?

[Please note that what a human viewer sees and what an
application processes is not the same distinction as
markup vs. field content.]

Regards,   Martin.

At 00/07/07 19:03 -0400, tgindin@us.ibm.com wrote:
>      I think we have a failure to communicate here.  I am making two
>claims.  First, the primary protection against digest collision attacks is
>the search time (and, for birthday attacks, storage) required to find
>digest collisions, not any limit on the number of documents with a given
>digest, so the third sentence of the proposed text is true but irrelevant.
>Second, even granting that it were relevant, the argument against
>normalizing the character set is also IMO wrong.  Let us suppose that the
>intended forgery is to insert the word "not" between "will" and "be" in a
>specific sentence.  Furthermore, let us suppose that a characteristic
>normalization transform maps the Latin-1 character for 1/2 (U00BD), the
>ASCII string 1/2, and the composed sequence 1 U2044 2 all to the same value
>on the grounds that they all represent the fraction one-half, and let us
>suppose that there are exactly 100 occurrences of the Latin-1 character in
>the document, but none of the others.  If the transform is applied before
>digesting, substituting one of the other two forms for 1/2 for the original
>has no effect on the digest, because the  transform maps all three to the
>same character sequence, so the forged document has only one possible
>digest.  If it isn't, each such substitution yields a different digest and
>the total number of digests available for the same document appearance is
>3**100, which is more than 1/3 of the total number of possible digest
>values.  Search time should still protect us, but the chances of finding a
>valid forgery are now restricted ONLY by search time.
>      In short, normalizing prior to digesting AVOIDS allowing
>inconsequential changes to change the digest.  If I have misunderstood the
>point of the section cited, I'm sure someone will correct me.
>
>           Tom Gindin
>
>
>"Joseph M. Reagle Jr." <reagle@w3.org> on 07/07/2000 05:58:35 PM
>
>To:   Tom Gindin/Watson/IBM@IBMUS
>cc:   "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John
>       Boyer" <jboyer@PureEdge.com>
>Subject:  Re: Followup on I18N Last Call comments and disposition
>
>
>
>At 10:52 2000-06-29 -0400, tgindin@us.ibm.com wrote:
>  >Well, it probably isn't even correct to call this a  "Birthday Attack,"
>I'm
>  >hoping someone else jumps in and tweaks the text, but I think the gist of
>  >what you are after is there.
>  >
>  >[Tom Gindin] The wording of section 8.1.3 is somewhat unfortunate
>already.
>  >While it is true that transforms appear to increase the number of
>documents
>  >which map to the same digest, that number is already literally
>  >astronomical.  For SHA-1, for example, the number of documents of length
>N
>  >octets in UTF-8 which map to a given digest is 256**(N-20) or
>  >2**(8*(N-20)).  Larger hash algorithms may increase the number 20
>somewhat,
>  >but a 200 octet message restricted to printable ASCII would still exceed
>  >2**1000.  Not normalizing before digesting is what allows inconsequential
>  >changes to affect the digest.
>
>I've tweaked the text slightly in the forthcoming draft, if anyone want to
>suggest alternative text in future versions, please propose it:
>
>8.1.3 Transforms Can Aid Collision Attacks
>In addition to the semantic concerns of transforms removing or including
>data from a source document prior to signing, there is potential for
>syntactical collision attacks. For instance, consider a signature which
>includes a transform that changes the character normalization of the source
>document to Normalized Form C [NFC]. This transform increases the number of
>documents that when transformed and digested yield the same hash value.
>Consequently, an attacker could include a subsantive syntactical and
>semantic change to the document by varying other inconsequential
>syntactical
>values that are normalized prior to digesting such that the tampered
>signature document is considered valid. Consequently, while we RECOMMEND
>all
>documents operated upon and generated by signature applications be in [NFC]
>(otherwise intermediate processors might unintentionally break the
>signature) encoding normalizations SHOULD NOT be done as part of a
>signature
>transform.
>
Received on Monday, 10 July 2000 05:02:02 UTC