- From: Martin J. Duerst <duerst@w3.org>
- Date: Tue, 27 Jun 2000 16:32:51 +0900
- To: reagle@w3.org, w3c-ietf-xmldsig@w3.org
Hello Joseph, dear XML Dsig WG, Below please find the follow-up on our last call comments and your disposition of them for xmldsig-core. This follow-up consists of the relevant excerpt from our recent f2f's minutes, [http://www.w3.org/International/Group/2000/06/ftf10/minutes] for those who can read them as well as some additional comments based on my careful reading of the July 1 core draft. First the excerpt from the minutes, with my comments added and flagged with ****: >>>> [35]XML-Signatures [35] http://www.w3.org/TR/xmldsig-core/ It is not clear what "signature application" means. If this means an application that produces signatures, we do not understand the sentence in the last paragraph of 7.0 which recommends that such applications produce normalized XML. **** It has to be cristal-clear that no actual normalization should occur in connection with any signing calculation. The current text is not clear enough. A note should be added explaining the security problem mentioned in our Last Call comments. **** E.g. in Section 8, at a convenient location (e.g. 8.1), add something like: Using character normalization (Normalization Form C of UTR #15) as a transform or as part of a transform can remove differences that are treated as relevant by most if not all XML processors. Character normalization should therefore be done at the origin of a document, and only checked, but never be done during signature processing. We don't insist on the inclusion of a normalization transform. **** The minutes got a bit short here. I guess it should read: We insist on not providing character normalization as a transform. We do not insist that character normalization checking is provided as a transform. The later was asked for in our last call comments, but we retracted it, based on the argument that while providing and using such a checking transform may reduce certain security threats, these threats are assumed to be minimal. I'm not a security specialist, and therefore don't know the specific language for describing this kind of theat, but here is an analogy: Assume that a certain class of documents were all completely written in lower-case. Assume an attacker wants to try to make a document that is different from an actually signed one, but results in the same signature when calculated. The attacker obviously has a (somewhat) easier job to create such a fake document if he can use both upper-case and lower- case letters. In order to make the attacker's job more difficult, it would therefore make sense to introduce a transform that checks that everything is lowercase. However, even if that's not done, the job of the attacker to create a fake document is still damn difficult, and therefore it's not really necessary to check that the document is all lower-case. It would be good if the DSig WG could check whether our understanding on this point is correct or not. Editorial: in 7.0, avoid "coded character set", use "character encoding". **** These are two different concepts! [36]Canonical XML [36] http://www.w3.org/TR/xml-c14n It should be mandated that, when a document is transcoded from a non-Unicode encoding to Unicode as part of C14N, normalization must be performed (At the bottom of section 2 and also in A.4 in the June 1st 2000 draft). **** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases, the transcoding result is normalized just by design of Normalization Form C, but there are some exceptions. **** The above also applies to other transcodings, e.g. done as part of the evaluation of an XPath transform or the minimal canonicalization. <<<< Further comments after careful reading of June-1-Core: In 4.3.3, the text on URI-Reference and "non-ASCII" characters should be alligned with that in XPointer http://www.w3.org/TR/xptr#uri-escaping to make sure all the details are correct. After 'Applications should be cognizant... protocol parameters and state information...', mention that if there is more than one URI to access some resource, the most specific should be used (i.e. e.g. http://www.w3.org/2000/06/interop-pressrelease.html.en instead of http://www.w3.org/2000/06/interop-pressrelease). In 4.3.3.1, in "IANA registered character set", the term 'character set' should be put in quotes. This is the term that IANA uses, but it's technically incorrect. In the same paragraph, change 'no .... needs such information' to 'no .... needs such explicit information'. 4.5 The Encoding attribute of <Object> is not described. Is that something like 'charset', or something like base64, or what. This needs to be clearly described or removed. 6.5 "Canonicalization algorithms take one implicit parameter:" Wrong, the charset is also an implicit parameter, and is very important. The spec must say that this parameter is derived according to the rules for the relevant protocols and formats, and that in particular for XML, the rules defined in RFC 2376 or its successor apply. The spec should also say that in order to be able to correctly sign and verify web documents, it is important that this information is delivered correctly, and that this may require settings on the server side. The spec should also say that for some 'charset's, there may be differences for some characters in which Unicode character a given character is converted, and and should point to the XML Japanese Profile (http://www.w3.org/TR/japanese-xml/) submission for an example and some advice. In particular, documents intended for digital signing should preferably be created using UTF-8 or UTF-16 from the start. The following sentence should also be added to 6.5: Various canonicalization algorithms require conversion to UTF-8. Where any such algorithm is REQUIRED or RECOMMENDED, this means that that algorithm has to understand at least UTF-8 and UTF-16 as input encodings. Knowledge of other encodings is OPTIONAL. [The disposition of comments says "(these issues can be better addressed in the C14N spec)", but because this also affects minimal canonicalization, XPath transform,..., this is not true.] 6.6.3.2 "The XPath implementation is expected to convert": 'is expected to' is too vague. Please chage this to 'must behave as if it converted....'. 6.6.3.4 'the string converted to UTF-8': Change to 'the string encoded in UTF-8'. (you can convert from one encoding to another, but XPath deals with character independent of an encoding, so convert sounds a bit strange) [two times in the same paragraph]. A similar wording problem exists for 'by serializing the node-set with a UTF-8 encoding'. There is only one UTF-8! 7.1, second list, point 2: 'except... and other character entities not representable...': This may be wrongly understood to mean that e.g. é in a HTML document shouldn't be expanded if the encoding is US-ASCII. This is of course wrong, é should in this case be changed to the appropriate numeric character reference (and the spec may have to say whether these should be decimal or hex,...). I think this list may be longer than you expeced, but I don't think any of the points is very difficult to fix. [I have also taken some notes on non-i18n aspects in Core, mostly wording improvements, which I hope to be able to send separately later.] Regards, Martin.
Received on Tuesday, 27 June 2000 03:23:36 UTC