- From: John Boyer <jboyer@PureEdge.com>
- Date: Thu, 29 Jun 2000 14:22:08 -0700
- To: "Joseph M. Reagle Jr." <reagle@w3.org>, "Martin J. Duerst" <duerst@w3.org>
- Cc: <w3c-ietf-xmldsig@w3.org>
Hi Joseph and Martin, >>Presently, the specification states: >> >> We RECOMMEND that signature applications produce XML >> content in Normalized Form C [NFC] and check that any XML >> being consumed is in that form as well (if not, signatures may >> consequently fail to validate). >> >>http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalizati o n >> >>Consequently, any application that states it is conformant with the XML >>Signature specification SHOULD do the above. > >I can imagine two different kinds of signature applications: > >1) Applications that create their content and sign it. >2) Applications that take content created elsewhere and > sign it. > >For 1), the above is the right recommendation. >For 2), the above is wrong. > >There may not be a need to make a distinction between >1) and 2) in the rest of the spec, but here this is >needed. Sorry, I still don't understand. The distinction seems immaterial, particularly depending on your view of the architecture where the signature engine is seperate from the content-content regardless. The content we are talking about is Signatures, not word processing files and such. Signature applications create XML Signatures, that's the content they concern themselves with. Now, we may be speaking more to the generic point of C14N XML (if so, that's why I'm trying to seperate these two threads). <john> Actually, I think Martin is talking more about actual content being signed by the signature, not content created to represent the signature. However, the text cited above from our spec recommends that both contents (the signature representative and the content representative) be in NFC to avoid signature breakage. To merge terminologies, the produced XML is the signature representative, and the XML being consumed is the content representative. I would agree with Joseph that the distinction does not appear to be warranted. Whether the content representative was created by a separate application or by the same application, the failure of that content to be in NFC still leaves the signature verification open to failure if a character is changed to an equivalent character with different UCS codepoint. The gist of Martin's argument is that it is unreasonable to assert that an XML signature application is non-compliant with our spec simply because it does not check whether the content over which a digest is calculated is in NFC. For instance, one may actually *want* to sign some data that is in a non-Unicode encoding. By the same reasoning, an XML signature application cannot be deemed non-compliant because the signatures it generates are not in NFC. However, the use of the term 'RECOMMEND' means that you 'SHOULD' do this unless you have a good reason not to and as long as you are aware that failure to follow the recommendations can impact the interoperability of your signatures or of your signature application. If there is a good reason to operate outside of Unicode, that's OK but the group can't get involved with specifying a normalization procedure for all non-Unicode character sets such that their signatures can even interoperate among applications that use the same non-Unicode character set. The best we can do is recommend the current standard for translation into Unicode, then recommend c14n so that the material signed is in UTF-8. Although we need the unforgiving nature of digest algorithms, it leaves us with few alternatives in this regard. </john> >> document is considered valid. Consequently, while we RECOMMEND all >> documents operated upon and generated by signature applications be in >> [NFC] (otherwise intermediate processors can unintentionally break the >> signature) encoding normalizations SHOULD NOT be done as part of a >> signature transform. >> http://www.w3.org/Signature/2000/06/section-8-I18N.html > >I think this puts two different kinds of concerns into the >same pot (but I'm not exactly sure, because I'm not really >familiar with the security language). Well, it probably isn't even correct to call this a "Birthday Attack," I'm hoping someone else jumps in and tweaks the text, but I think the gist of what you are after is there. >> > It should be mandated that, when a document is transcoded from a >> > non-Unicode encoding to Unicode as part of C14N, normalization must be >> > performed (At the bottom of section 2 and also in A.4 in the June 1st >> > 2000 draft). >> > >> >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases, >> > the transcoding result is normalized just by design of Normalization >> > Form C, but there are some exceptions. >> > >> >**** The above also applies to other transcodings, e.g. done as part >> > of the evaluation of an XPath transform or the minimal >>canonicalization. >> >>Is this precluded by the security concern raised above? > >No, this is something different. It just helps to make >transcodings as deterministic as possible. If some transcoder >e.g. would convert a precomposed character in iso-8859-1 to >a decomposed representation (non-normalized), checking >would just not work. John, given that we've specified the format and syntax of a Signature, is it likely that a Signature will be in a non-Unicode format? Regardless, if so, Martin what is a "transcoding"? Moving from one character encoding scheme to another? I'm certainly getting an educating in characters! From my reading [1,2] I understand the following: + Character Repertoire (CR) = a set of abstract characters + Coded Character Set (CCS) = a mapping of code values (space, points, positions) to a Character Repertoire + Character Encoding Scheme (CES) = scheme for representing a character repertoire in a code space. Frequently, a (|CR| > |code space|) so one has to do various extensions and escaping to represent those extra abstract chacters. UTF-8 is a CES. + Charset = CCS + CES [1] http://www.w3.org/MarkUp/html-spec/charset-harmful [2] http://czyborra.com/utf/ <john> The fact that we have specified the format and syntax of Signature does not preclude non-Unicode encodings of the signature. Likelihood is another story, but also not relevant. Moreover, by the dsig default, the signature representative XML is expressed in UTF-8 by virtue of c14n. Martin's point appears to be that a Signature element exists prior to signature creation because it specifies how to create the signature (as well as how to verify it). Since it can exist, it can exist in a non-Unicode format, so what do we mean when we say that the result will be UTF-8 by virtue of c14n? How do we consistently take the data from non-Unicode to UTF-8? So far, my answer has been that c14n is based on the XPath data model, which is based on UCS. XPath assumes that you, the application developer, will attach an XML processor that creates input appropriate for an XPath evaluator. This means that you, the application developer, are responsible for transcoding from your non-Unicode format into UCS. We recommend that your transcoding observe the rules of NFC w.r.t. the input it creates for the XPath evaluator (or logically equivalent implementation). This recommendation is made so that your signatures won't break. *************************************** John Boyer, Software Development Manager PureEdge Solutions (formerly UWI.Com) Creating Binding E-Commerce v:250-479-8334, ext. 143 f:250-479-3772 1-888-517-2675 http://www.PureEdge.com *************************************** </john>
Received on Thursday, 29 June 2000 17:22:23 UTC