RE: Followup on I18N Last Call comments and disposition from Martin J. Duerst on 2000-07-10 (w3c-ietf-xmldsig@w3.org from July to September 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Mon, 10 Jul 2000 18:31:27 +0900
To: "John Boyer" <jboyer@PureEdge.com>, "Joseph M. Reagle Jr." <reagle@w3.org>
Cc: <w3c-ietf-xmldsig@w3.org>, www-international@w3.org
Message-Id: <4.2.0.58.J.20000710182057.033a0700@sh.w3.mag.keio.ac.jp>
Hello John,

At 00/06/29 14:22 -0700, John Boyer wrote:
>Hi Joseph and Martin,
>
>  >>Presently, the specification states:
>  >>
>  >>         We RECOMMEND that signature applications produce XML
>  >>         content in Normalized Form C [NFC] and check that any XML
>  >>         being consumed is in that form as well (if not, signatures may
>  >>         consequently fail to validate).
>  >>
>
> >>http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalizati
>o
>n
>  >>
>  >>Consequently, any application that states it is conformant with the XML
>  >>Signature specification SHOULD do the above.
>  >
>  >I can imagine two different kinds of signature applications:
>  >
>  >1) Applications that create their content and sign it.
>  >2) Applications that take content created elsewhere and
>  >   sign it.
>  >
>  >For 1), the above is the right recommendation.
>  >For 2), the above is wrong.
>  >
>  >There may not be a need to make a distinction between
>  >1) and 2) in the rest of the spec, but here this is
>  >needed.
>
>Sorry, I still don't understand. The distinction seems immaterial,
>particularly depending on your view of the architecture where the signature
>engine is seperate from the content-content regardless. The content we are
>talking about is Signatures, not word processing files and such. Signature
>applications create XML Signatures, that's the content they concern
>themselves with.
>
>Now, we may be speaking more to the generic point of C14N XML (if so, that's
>why I'm trying to seperate these two threads).
>
><john>
>Actually, I think Martin is talking more about actual content being signed
>by the signature, not content created to represent the signature.

Yes indeed. The content created to represent the signature
should be normalized from the start anyway, like any other
content that is newly created.


>However,
>the text cited above from our spec recommends that both contents (the
>signature representative and the content representative) be in NFC to avoid
>signature breakage.  To merge terminologies, the produced XML is the
>signature representative, and the XML being consumed is the content
>representative.
>
>I would agree with Joseph that the distinction does not appear to be
>warranted.  Whether the content representative was created by a separate
>application or by the same application, the failure of that content to be in
>NFC still leaves the signature verification open to failure if a character
>is changed to an equivalent character with different UCS codepoint.
>
>The gist of Martin's argument is that it is unreasonable to assert that an
>XML signature application is non-compliant with our spec simply because it
>does not check whether the content over which a digest is calculated is in
>NFC.  For instance, one may actually *want* to sign some data that is in a
>non-Unicode encoding.

In very rare cases, maybe yes.


>By the same reasoning, an XML signature application
>cannot be deemed non-compliant because the signatures it generates are not
>in NFC.

I wouldn't mind here, with the precaution that a signature not
in an Unicode encoding in most cases doesn't need any actual
normalization.



>However, the use of the term 'RECOMMEND' means that you 'SHOULD' do this
>unless you have a good reason not to and as long as you are aware that
>failure to follow the recommendations can impact the interoperability of
>your signatures or of your signature application.  If there is a good reason
>to operate outside of Unicode, that's OK but the group can't get involved
>with specifying a normalization procedure for all non-Unicode character sets
>such that their signatures can even interoperate among applications that use
>the same non-Unicode character set.  The best we can do is recommend the
>current standard for translation into Unicode, then recommend c14n so that
>the material signed is in UTF-8.  Although we need the unforgiving nature of
>digest algorithms, it leaves us with few alternatives in this regard.
></john>
>
>  >>   document is considered valid. Consequently, while we RECOMMEND all
>  >>   documents operated upon and generated by signature applications be in
>  >>   [NFC] (otherwise intermediate processors can unintentionally break the
>  >>   signature) encoding normalizations SHOULD NOT be done as part of a
>  >>   signature transform.
>  >>   http://www.w3.org/Signature/2000/06/section-8-I18N.html
>  >
>  >I think this puts two different kinds of concerns into the
>  >same pot (but I'm not exactly sure, because I'm not really
>  >familiar with the security language).
>
>Well, it probably isn't even correct to call this a  "Birthday Attack," I'm
>hoping someone else jumps in and tweaks the text, but I think the gist of
>what you are after is there.
>
>  >>  >    It should be mandated that, when a document is transcoded from a
>  >>  >    non-Unicode encoding to Unicode as part of C14N, normalization
>must be
>  >>  >    performed (At the bottom of section 2 and also in A.4 in the June
>1st
>  >>  >    2000 draft).
>  >>  >
>  >>  >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
>  >>  >      the transcoding result is normalized just by design of
>Normalization
>  >>  >      Form C, but there are some exceptions.
>  >>  >
>  >>  >**** The above also applies to other transcodings, e.g. done as part
>  >>  >      of the evaluation of an XPath transform or the minimal
>  >>canonicalization.
>  >>
>  >>Is this precluded by the security concern raised above?
>  >
>  >No, this is something different. It just helps to make
>  >transcodings as deterministic as possible. If some transcoder
>  >e.g. would convert a precomposed character in iso-8859-1 to
>  >a decomposed representation (non-normalized), checking
>  >would just not work.
>
>John, given that we've specified the format and syntax of a Signature, is it
>likely that a Signature will be in a non-Unicode format? Regardless, if so,
>Martin what is a "transcoding"? Moving from one character encoding scheme to
>another? I'm certainly getting an educating in characters! From my reading
>[1,2] I understand the following:
>           + Character Repertoire (CR) = a set of abstract characters
>           + Coded Character Set (CCS) = a mapping of code values (space,
>             points, positions) to a Character Repertoire
>           + Character Encoding Scheme (CES) = scheme for representing a
>             character repertoire in a code space. Frequently, a (|CR| >
>             |code space|) so one has to do various extensions and
>             escaping to represent those extra abstract chacters. UTF-8 is a
>CES.
>           + Charset = CCS + CES
>
>      [1] http://www.w3.org/MarkUp/html-spec/charset-harmful
>      [2] http://czyborra.com/utf/
>
><john>
>The fact that we have specified the format and syntax of Signature does not
>preclude non-Unicode encodings of the signature.  Likelihood is another
>story, but also not relevant.

I agree. XML is not limited to UTF-8 and UTF-16. You may want
to include a signature in an existing document, and that
document may for some reason e.g. be in Shift-JIS.



>Moreover, by the dsig default, the signature representative XML is expressed
>in UTF-8 by virtue of c14n.
>
>Martin's point appears to be that a Signature element exists prior to
>signature creation because it specifies how to create the signature (as well
>as how to verify it).  Since it can exist, it can exist in a non-Unicode
>format, so what do we mean when we say that the result will be UTF-8 by
>virtue of c14n? How do we consistently take the data from non-Unicode to
>UTF-8?

Yes, here is where the problems mentioned in http://www.w3.org/TR/japanese-xml/
come into play. In short: Watch out for certain characters (usually extremely
few) where not all implementations convert to Unicode the same way
(independent of character normalization).


>So far, my answer has been that c14n is based on the XPath data model, which
>is based on UCS.  XPath assumes that you, the application developer, will
>attach an XML processor that creates input appropriate for an XPath
>evaluator.  This means that you, the application developer, are responsible
>for transcoding from your non-Unicode format into UCS.
>
>We recommend that your transcoding observe the rules of NFC w.r.t. the input
>it creates for the XPath evaluator (or logically equivalent implementation).

Yes, unless the input is already in an Unicode-based encoding
(UTF-8, UTF-16,...).


Regards,   Martin.



>This recommendation is made so that your signatures won't break.
>
>***************************************
>John Boyer,
>Software Development Manager
>
>PureEdge Solutions (formerly UWI.Com)
>Creating Binding E-Commerce
>
>v:250-479-8334, ext. 143 f:250-479-3772
>1-888-517-2675  http://www.PureEdge.com
>***************************************
>
></john>
>
Received on Monday, 10 July 2000 05:58:49 UTC