Re: Followup on I18N Last Call comments and disposition from Mark Davis on 2000-07-10 (www-international@w3.org from July to September 2000)

From: Mark Davis <markdavis@ispchannel.com>
Date: Mon, 10 Jul 2000 08:16:43 -0700
To: "Martin J. Duerst" <duerst@w3.org>
CC: tgindin@us.ibm.com, "Joseph M. Reagle Jr." <reagle@w3.org>, w3c-ietf-xmldsig@w3.org, John Boyer <jboyer@PureEdge.com>, www-international@w3.org
Message-ID: <3969E8DB.88B061C2@ispchannel.com>
Let's be careful to use real examples. U+00BD does not change in NFC: to see this
if you don't have the Unicode book handy, look at
http://www.unicode.org/unicode/reports/tr15/charts/NormalizationChart5.html and
search for "00BD".

If you look at the other number examples, you will see that numbers (which appear
to be the most sensitive for these kinds of attacks) do not change in NFC.

Mark

"Martin J. Duerst" wrote:

> Tom - Your considerations below are true if you assume that other
> kinds of applications will not make a difference between
> (in your example) the Latin-1 character for 1/2 (U00BD), the
> ASCII string 1/2, and the composed sequence 1 U2044 2.
>
> However, the basic idea of early normalization is to make things
> easy for as many applications as possible, and 'to make things easy'
> means to not have to consider such equivalences for every single
> operation in every application. For applications that indeed
> distinguish these representations, or for places where these
> representations are distinguished, the attack you describe
> below is not relevant. Twiddling around with the equivalences
> will produce something different anyway, and so signatures
> and digests shouldn't normalize these differences out.
>
> There is of course the chance of 'applications' not making the
> difference. The obvious case is the human viewer :-).
> In order to cover both sides, I think our earlier proposal
> of a transform that checks for normalization, but doesn't
> actually do it (just fails if the input is not normalized)
> might do the job. Can you please check this?
>
> [Please note that what a human viewer sees and what an
> application processes is not the same distinction as
> markup vs. field content.]
>
> Regards,   Martin.
>
> At 00/07/07 19:03 -0400, tgindin@us.ibm.com wrote:
> >      I think we have a failure to communicate here.  I am making two
> >claims.  First, the primary protection against digest collision attacks is
> >the search time (and, for birthday attacks, storage) required to find
> >digest collisions, not any limit on the number of documents with a given
> >digest, so the third sentence of the proposed text is true but irrelevant.
> >Second, even granting that it were relevant, the argument against
> >normalizing the character set is also IMO wrong.  Let us suppose that the
> >intended forgery is to insert the word "not" between "will" and "be" in a
> >specific sentence.  Furthermore, let us suppose that a characteristic
> >normalization transform maps the Latin-1 character for 1/2 (U00BD), the
> >ASCII string 1/2, and the composed sequence 1 U2044 2 all to the same value
> >on the grounds that they all represent the fraction one-half, and let us
> >suppose that there are exactly 100 occurrences of the Latin-1 character in
> >the document, but none of the others.  If the transform is applied before
> >digesting, substituting one of the other two forms for 1/2 for the original
> >has no effect on the digest, because the  transform maps all three to the
> >same character sequence, so the forged document has only one possible
> >digest.  If it isn't, each such substitution yields a different digest and
> >the total number of digests available for the same document appearance is
> >3**100, which is more than 1/3 of the total number of possible digest
> >values.  Search time should still protect us, but the chances of finding a
> >valid forgery are now restricted ONLY by search time.
> >      In short, normalizing prior to digesting AVOIDS allowing
> >inconsequential changes to change the digest.  If I have misunderstood the
> >point of the section cited, I'm sure someone will correct me.
> >
> >           Tom Gindin
> >
> >
> >"Joseph M. Reagle Jr." <reagle@w3.org> on 07/07/2000 05:58:35 PM
> >
> >To:   Tom Gindin/Watson/IBM@IBMUS
> >cc:   "Martin J. Duerst" <duerst@w3.org>, w3c-ietf-xmldsig@w3.org, "John
> >       Boyer" <jboyer@PureEdge.com>
> >Subject:  Re: Followup on I18N Last Call comments and disposition
> >
> >
> >
> >At 10:52 2000-06-29 -0400, tgindin@us.ibm.com wrote:
> >  >Well, it probably isn't even correct to call this a  "Birthday Attack,"
> >I'm
> >  >hoping someone else jumps in and tweaks the text, but I think the gist of
> >  >what you are after is there.
> >  >
> >  >[Tom Gindin] The wording of section 8.1.3 is somewhat unfortunate
> >already.
> >  >While it is true that transforms appear to increase the number of
> >documents
> >  >which map to the same digest, that number is already literally
> >  >astronomical.  For SHA-1, for example, the number of documents of length
> >N
> >  >octets in UTF-8 which map to a given digest is 256**(N-20) or
> >  >2**(8*(N-20)).  Larger hash algorithms may increase the number 20
> >somewhat,
> >  >but a 200 octet message restricted to printable ASCII would still exceed
> >  >2**1000.  Not normalizing before digesting is what allows inconsequential
> >  >changes to affect the digest.
> >
> >I've tweaked the text slightly in the forthcoming draft, if anyone want to
> >suggest alternative text in future versions, please propose it:
> >
> >8.1.3 Transforms Can Aid Collision Attacks
> >In addition to the semantic concerns of transforms removing or including
> >data from a source document prior to signing, there is potential for
> >syntactical collision attacks. For instance, consider a signature which
> >includes a transform that changes the character normalization of the source
> >document to Normalized Form C [NFC]. This transform increases the number of
> >documents that when transformed and digested yield the same hash value.
> >Consequently, an attacker could include a subsantive syntactical and
> >semantic change to the document by varying other inconsequential
> >syntactical
> >values that are normalized prior to digesting such that the tampered
> >signature document is considered valid. Consequently, while we RECOMMEND
> >all
> >documents operated upon and generated by signature applications be in [NFC]
> >(otherwise intermediate processors might unintentionally break the
> >signature) encoding normalizations SHOULD NOT be done as part of a
> >signature
> >transform.
> >
Received on Monday, 10 July 2000 11:15:40 UTC