RE: Followup on I18N Last Call comments and disposition from John Boyer on 2000-06-29 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Thu, 29 Jun 2000 14:22:08 -0700
To: "Joseph M. Reagle Jr." <reagle@w3.org>, "Martin J. Duerst" <duerst@w3.org>
Cc: <w3c-ietf-xmldsig@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKIEIPCDAA.jboyer@PureEdge.com>
Hi Joseph and Martin,

 >>Presently, the specification states:
 >>
 >>         We RECOMMEND that signature applications produce XML
 >>         content in Normalized Form C [NFC] and check that any XML
 >>         being consumed is in that form as well (if not, signatures may
 >>         consequently fail to validate).
 >>

>>http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalizati
o
n
 >>
 >>Consequently, any application that states it is conformant with the XML
 >>Signature specification SHOULD do the above.
 >
 >I can imagine two different kinds of signature applications:
 >
 >1) Applications that create their content and sign it.
 >2) Applications that take content created elsewhere and
 >   sign it.
 >
 >For 1), the above is the right recommendation.
 >For 2), the above is wrong.
 >
 >There may not be a need to make a distinction between
 >1) and 2) in the rest of the spec, but here this is
 >needed.

Sorry, I still don't understand. The distinction seems immaterial,
particularly depending on your view of the architecture where the signature
engine is seperate from the content-content regardless. The content we are
talking about is Signatures, not word processing files and such. Signature
applications create XML Signatures, that's the content they concern
themselves with.

Now, we may be speaking more to the generic point of C14N XML (if so, that's
why I'm trying to seperate these two threads).

<john>
Actually, I think Martin is talking more about actual content being signed
by the signature, not content created to represent the signature.  However,
the text cited above from our spec recommends that both contents (the
signature representative and the content representative) be in NFC to avoid
signature breakage.  To merge terminologies, the produced XML is the
signature representative, and the XML being consumed is the content
representative.

I would agree with Joseph that the distinction does not appear to be
warranted.  Whether the content representative was created by a separate
application or by the same application, the failure of that content to be in
NFC still leaves the signature verification open to failure if a character
is changed to an equivalent character with different UCS codepoint.

The gist of Martin's argument is that it is unreasonable to assert that an
XML signature application is non-compliant with our spec simply because it
does not check whether the content over which a digest is calculated is in
NFC.  For instance, one may actually *want* to sign some data that is in a
non-Unicode encoding.  By the same reasoning, an XML signature application
cannot be deemed non-compliant because the signatures it generates are not
in NFC.

However, the use of the term 'RECOMMEND' means that you 'SHOULD' do this
unless you have a good reason not to and as long as you are aware that
failure to follow the recommendations can impact the interoperability of
your signatures or of your signature application.  If there is a good reason
to operate outside of Unicode, that's OK but the group can't get involved
with specifying a normalization procedure for all non-Unicode character sets
such that their signatures can even interoperate among applications that use
the same non-Unicode character set.  The best we can do is recommend the
current standard for translation into Unicode, then recommend c14n so that
the material signed is in UTF-8.  Although we need the unforgiving nature of
digest algorithms, it leaves us with few alternatives in this regard.
</john>

 >>   document is considered valid. Consequently, while we RECOMMEND all
 >>   documents operated upon and generated by signature applications be in
 >>   [NFC] (otherwise intermediate processors can unintentionally break the
 >>   signature) encoding normalizations SHOULD NOT be done as part of a
 >>   signature transform.
 >>   http://www.w3.org/Signature/2000/06/section-8-I18N.html
 >
 >I think this puts two different kinds of concerns into the
 >same pot (but I'm not exactly sure, because I'm not really
 >familiar with the security language).

Well, it probably isn't even correct to call this a  "Birthday Attack," I'm
hoping someone else jumps in and tweaks the text, but I think the gist of
what you are after is there.

 >>  >    It should be mandated that, when a document is transcoded from a
 >>  >    non-Unicode encoding to Unicode as part of C14N, normalization
must be
 >>  >    performed (At the bottom of section 2 and also in A.4 in the June
1st
 >>  >    2000 draft).
 >>  >
 >>  >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
 >>  >      the transcoding result is normalized just by design of
Normalization
 >>  >      Form C, but there are some exceptions.
 >>  >
 >>  >**** The above also applies to other transcodings, e.g. done as part
 >>  >      of the evaluation of an XPath transform or the minimal
 >>canonicalization.
 >>
 >>Is this precluded by the security concern raised above?
 >
 >No, this is something different. It just helps to make
 >transcodings as deterministic as possible. If some transcoder
 >e.g. would convert a precomposed character in iso-8859-1 to
 >a decomposed representation (non-normalized), checking
 >would just not work.

John, given that we've specified the format and syntax of a Signature, is it
likely that a Signature will be in a non-Unicode format? Regardless, if so,
Martin what is a "transcoding"? Moving from one character encoding scheme to
another? I'm certainly getting an educating in characters! From my reading
[1,2] I understand the following:
          + Character Repertoire (CR) = a set of abstract characters
          + Coded Character Set (CCS) = a mapping of code values (space,
            points, positions) to a Character Repertoire
          + Character Encoding Scheme (CES) = scheme for representing a
            character repertoire in a code space. Frequently, a (|CR| >
            |code space|) so one has to do various extensions and
            escaping to represent those extra abstract chacters. UTF-8 is a
CES.
          + Charset = CCS + CES

     [1] http://www.w3.org/MarkUp/html-spec/charset-harmful
     [2] http://czyborra.com/utf/

<john>
The fact that we have specified the format and syntax of Signature does not
preclude non-Unicode encodings of the signature.  Likelihood is another
story, but also not relevant.
Moreover, by the dsig default, the signature representative XML is expressed
in UTF-8 by virtue of c14n.

Martin's point appears to be that a Signature element exists prior to
signature creation because it specifies how to create the signature (as well
as how to verify it).  Since it can exist, it can exist in a non-Unicode
format, so what do we mean when we say that the result will be UTF-8 by
virtue of c14n? How do we consistently take the data from non-Unicode to
UTF-8?

So far, my answer has been that c14n is based on the XPath data model, which
is based on UCS.  XPath assumes that you, the application developer, will
attach an XML processor that creates input appropriate for an XPath
evaluator.  This means that you, the application developer, are responsible
for transcoding from your non-Unicode format into UCS.

We recommend that your transcoding observe the rules of NFC w.r.t. the input
it creates for the XPath evaluator (or logically equivalent implementation).
This recommendation is made so that your signatures won't break.

***************************************
John Boyer,
Software Development Manager

PureEdge Solutions (formerly UWI.Com)
Creating Binding E-Commerce

v:250-479-8334, ext. 143 f:250-479-3772
1-888-517-2675  http://www.PureEdge.com
***************************************

</john>
Received on Thursday, 29 June 2000 17:22:23 UTC