Re: Followup on I18N Last Call comments and disposition

At 16:32 6/27/00 +0900, Martin J. Duerst wrote:
 >Hello Joseph

Hi Martin, again, thank you for the comments.

[
BTW: back in [a], what did you mean by:
>- General
>   The treatment of xml:lang, eg during transforms, is unclear.
[a] http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0254.html
]

 >First the excerpt from the minutes, with my comments added
 >and flagged with ****:
 > http://www.w3.org/International/Group/2000/06/ftf10/minutes

Understood. My responses follow, other may want to pitch in, and I defer the
latter comments to Boyer.

 >      [35] http://www.w3.org/TR/xmldsig-core/
 >
 >    It is not clear what "signature application" means. If this means an
 >    application that produces signatures, we do not understand the
 >    sentence in the last paragraph of 7.0 which recommends that such
 >    applications produce normalized XML.
 
A Signature application is an application that is conformant with the XML
Signature specification.

 >**** It has to be cristal-clear that no actual normalization should
 >      occur in connection with any signing calculation. The current
 >      text is not clear enough.
 
Presently, the specification states:

        We RECOMMEND that signature applications produce XML 
        content in Normalized Form C [NFC] and check that any XML 
        being consumed is in that form as well (if not, signatures may 
        consequently fail to validate).

http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalization

Consequently, any application that states it is conformant with the XML
Signature specification SHOULD do the above. That's what it means, I think
your comment is introducing the next issue:

 >    A note should be added explaining the security problem mentioned in
 >    our Last Call comments.
 >
 >**** E.g. in Section 8, at a convenient location (e.g. 8.1), add
 >      something like: Using character normalization (Normalization
 >      Form C of UTR #15) as a transform or as part of a transform
 >      can remove differences that are treated as relevant by most
 >      if not all XML processors. Character normalization should
 >      therefore be done at the origin of a document, and only
 >      checked, but never be done during signature processing.

I propose text and re-orged the first two points of section 8 to deal with
this:

  8.1.3 Transforms Can Aid Birthday Attacks

  In addition to the semantic concerns of transforms removing or
  including data from a source document prior to signing, there is
  potential for syntactical collision attacks. For instance, consider a
  signature which includes a transform that changes the character
  normalization of the source document to Normalized Form C [NFC]. This
  transform dramatically increases the number of documents that when
  transformed and digested yield the same hash value. Consequently, an
  attacker could include a subsantive syntactical and semantic change to
  the document by varying other inconsequential syntactical values that
  are normalized prior to digesting such that the tampered signature
  document is considered valid. Consequently, while we RECOMMEND all
  documents operated upon and generated by signature applications be in
  [NFC] (otherwise intermediate processors can unintentionally break the
  signature) encoding normalizations SHOULD NOT be done as part of a
  signature transform.
  http://www.w3.org/Signature/2000/06/section-8-I18N.html


 >    We don't insist on the inclusion of a normalization transform.
 >
 >**** The minutes got a bit short here. I guess it should read:
 >      We insist on not providing character normalization as a
 >      transform. We do not insist that character normalization
 >      checking is provided as a transform.

Agreed.


 >    Editorial: in 7.0, avoid "coded character set", use "character
 >    encoding".
 >
 >**** These are two different concepts!

Agreed. UTF is a character encoding scheme.

 >    It should be mandated that, when a document is transcoded from a
 >    non-Unicode encoding to Unicode as part of C14N, normalization must be
 >    performed (At the bottom of section 2 and also in A.4 in the June 1st
 >    2000 draft).
 >
 >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
 >      the transcoding result is normalized just by design of Normalization
 >      Form C, but there are some exceptions.
 >
 >**** The above also applies to other transcodings, e.g. done as part
 >      of the evaluation of an XPath transform or the minimal
canonicalization.

Is this precluded by the security concern raised above?

 ><<<<
 >Further comments after careful reading of June-1-Core:
 >
 >In 4.3.3, the text on URI-Reference and "non-ASCII" characters
 >should be alligned with that in XPointer
 >http://www.w3.org/TR/xptr#uri-escaping to make sure all the
 >details are correct.
 
I believe the present text is sufficient:

        (Non-ASCII characters in a URI should be represented in 
        UTF-8 [UTF-8] as one or more bytes, and then escaping these 
        bytes with the URI escaping mechanism. [XML])
        http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-Reference

To state that all fragment identifiers (even from other MIME types) must be
escaped as defined by XPTr would be innapproriate. If someone uses XPTr,
they should follow the XPtr spec of course.


 >After 'Applications should be cognizant... protocol parameters
 >and state information...', mention that if there is more than
 >one URI to access some resource, the most specific should be used
 >(i.e. e.g. http://www.w3.org/2000/06/interop-pressrelease.html.en
 >instead of http://www.w3.org/2000/06/interop-pressrelease).
 
Ok.

 >In 4.3.3.1, in "IANA registered character set", the term 'character set'
 >should be put in quotes. This is the term that IANA uses, but it's
 >technically incorrect.
 
Ok.

 >In the same paragraph, change 'no .... needs such information' to
 >'no .... needs such explicit information'.

Ok.

 >
 >4.5 The Encoding attribute of <Object> is not described. Is that
 >something like 'charset', or something like base64, or what.
 >This needs to be clearly described or removed.
 
I propose, "The Object's Encoding attributed may be used to provide a URI
that identifies the method by which the object is encoded."

 >6.5 "Canonicalization algorithms take one implicit parameter:"
 >Wrong, the charset is also an implicit parameter, and is very
 >important. The spec must say that this parameter is derived
 >according to the rules for the relevant protocols and formats,
 >and that in particular for XML, the rules defined in RFC 2376
 >or its successor apply.
 >The spec should also say that in order to be able to correctly
 >sign and verify web documents, it is important that this information
 >is delivered correctly, and that this may require settings
 >on the server side.
 >The spec should also say that for some 'charset's, there may
 >be differences for some characters in which Unicode character
 >a given character is converted, and and should point to
 >the XML Japanese Profile (http://www.w3.org/TR/japanese-xml/)
 >submission for an example and some advice. In particular,
 >documents intended for digital signing should preferably
 >be created using UTF-8 or UTF-16 from the start.
 >The following sentence should also be added to 6.5:
 >Various canonicalization algorithms require conversion to
 >UTF-8. Where any such algorithm is REQUIRED or RECOMMENDED,
 >this means that that algorithm has to understand at least
 >UTF-8 and UTF-16 as input encodings. Knowledge of other
 >encodings is OPTIONAL.
 >
 >[The disposition of comments says "(these issues can be
 >better addressed in the C14N spec)", but because this
 >also affects minimal canonicalization, XPath transform,...,
 >this is not true.]
 
Proposed text:

Canonicalization algorithms takes two implicit parameter when they appear as
a CanonicalizationMethod within the SignedInfo element: the content and its
charset. (Note, there may be ambiguities in converting existing charsets to
Unicode, see XML Japanese Profile [XML-Japanese] for more information.) The
charset is derived according to the rules of the transport protocols and
media formats (e.g, RFC2376 [XML-MT] defines the media types for XML). This
information is necessary to correctly sign and verify documents and often
requires careful server side configuration. Various canonicalization
algorithms require conversion to [UTF-8]. Where any such algorithm is
REQUIRED or RECOMMENDED the algorithm MUST understand at least [UTF-8] and
[UTF-16] as input encodings. Knowledge of other encodings is OPTIONAL

I leave John to address the following:

 >6.6.3.2 "The XPath implementation is expected to convert":
 >'is expected to' is too vague. Please chage this to
 >'must behave as if it converted....'.
 >
 >6.6.3.4 'the string converted to UTF-8': Change to 'the string
 >encoded in UTF-8'. (you can convert from one encoding to
 >another, but XPath deals with character independent of
 >an encoding, so convert sounds a bit strange) [two times
 >in the same paragraph]. A similar wording problem exists
 >for 'by serializing the node-set with a UTF-8 encoding'.
 >There is only one UTF-8!
 >
 >7.1, second list, point 2: 'except... and other character entities
 >not representable...': This may be wrongly understood to mean that
 >e.g. &eacute; in a HTML document shouldn't be expanded if
 >the encoding is US-ASCII. This is of course wrong, &eacute;
 >should in this case be changed to the appropriate numeric
 >character reference (and the spec may have to say whether
 >these should be decimal or hex,...).



_________________________________________________________
Joseph Reagle Jr.   
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/People/Reagle/

Received on Tuesday, 27 June 2000 15:50:34 UTC