Followup on I18N Last Call comments and disposition

Hello Joseph, dear XML Dsig WG,

Below please find the follow-up on our last call comments
and your disposition of them for xmldsig-core.

This follow-up consists of the relevant excerpt from our recent
f2f's minutes,
[http://www.w3.org/International/Group/2000/06/ftf10/minutes]
for those who can read them as well as some additional comments
based on my careful reading of the July 1 core draft.

First the excerpt from the minutes, with my comments added
and flagged with ****:

 >>>>
     [35]XML-Signatures

      [35] http://www.w3.org/TR/xmldsig-core/

    It is not clear what "signature application" means. If this means an
    application that produces signatures, we do not understand the
    sentence in the last paragraph of 7.0 which recommends that such
    applications produce normalized XML.

**** It has to be cristal-clear that no actual normalization should
      occur in connection with any signing calculation. The current
      text is not clear enough.

    A note should be added explaining the security problem mentioned in
    our Last Call comments.

**** E.g. in Section 8, at a convenient location (e.g. 8.1), add
      something like: Using character normalization (Normalization
      Form C of UTR #15) as a transform or as part of a transform
      can remove differences that are treated as relevant by most
      if not all XML processors. Character normalization should
      therefore be done at the origin of a document, and only
      checked, but never be done during signature processing.

    We don't insist on the inclusion of a normalization transform.

**** The minutes got a bit short here. I guess it should read:
      We insist on not providing character normalization as a
      transform. We do not insist that character normalization
      checking is provided as a transform.
      The later was asked for in our last call comments, but
      we retracted it, based on the argument that while
      providing and using such a checking transform may
      reduce certain security threats, these threats are
      assumed to be minimal. I'm not a security specialist,
      and therefore don't know the specific language for
      describing this kind of theat, but here is an analogy:
      Assume that a certain class of documents were all completely
      written in lower-case. Assume an attacker wants to try to
      make a document that is different from an actually signed
      one, but results in the same signature when calculated.
      The attacker obviously has a (somewhat) easier job to create
      such a fake document if he can use both upper-case and lower-
      case letters. In order to make the attacker's job more difficult,
      it would therefore make sense to introduce a transform that
      checks that everything is lowercase. However, even if that's
      not done, the job of the attacker to create a fake document
      is still damn difficult, and therefore it's not really necessary
      to check that the document is all lower-case.
      It would be good if the DSig WG could check whether our
      understanding on this point is correct or not.

    Editorial: in 7.0, avoid "coded character set", use "character
    encoding".

**** These are two different concepts!

     [36]Canonical XML

      [36] http://www.w3.org/TR/xml-c14n

    It should be mandated that, when a document is transcoded from a
    non-Unicode encoding to Unicode as part of C14N, normalization must be
    performed (At the bottom of section 2 and also in A.4 in the June 1st
    2000 draft).

**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
      the transcoding result is normalized just by design of Normalization
      Form C, but there are some exceptions.

**** The above also applies to other transcodings, e.g. done as part
      of the evaluation of an XPath transform or the minimal canonicalization.
<<<<

Further comments after careful reading of June-1-Core:

In 4.3.3, the text on URI-Reference and "non-ASCII" characters
should be alligned with that in XPointer
http://www.w3.org/TR/xptr#uri-escaping to make sure all the
details are correct.

After 'Applications should be cognizant... protocol parameters
and state information...', mention that if there is more than
one URI to access some resource, the most specific should be used
(i.e. e.g. http://www.w3.org/2000/06/interop-pressrelease.html.en
instead of http://www.w3.org/2000/06/interop-pressrelease).

In 4.3.3.1, in "IANA registered character set", the term 'character set'
should be put in quotes. This is the term that IANA uses, but it's
technically incorrect.

In the same paragraph, change 'no .... needs such information' to
'no .... needs such explicit information'.

4.5 The Encoding attribute of <Object> is not described. Is that
something like 'charset', or something like base64, or what.
This needs to be clearly described or removed.

6.5 "Canonicalization algorithms take one implicit parameter:"
Wrong, the charset is also an implicit parameter, and is very
important. The spec must say that this parameter is derived
according to the rules for the relevant protocols and formats,
and that in particular for XML, the rules defined in RFC 2376
or its successor apply.
The spec should also say that in order to be able to correctly
sign and verify web documents, it is important that this information
is delivered correctly, and that this may require settings
on the server side.
The spec should also say that for some 'charset's, there may
be differences for some characters in which Unicode character
a given character is converted, and and should point to
the XML Japanese Profile (http://www.w3.org/TR/japanese-xml/)
submission for an example and some advice. In particular,
documents intended for digital signing should preferably
be created using UTF-8 or UTF-16 from the start.
The following sentence should also be added to 6.5:
Various canonicalization algorithms require conversion to
UTF-8. Where any such algorithm is REQUIRED or RECOMMENDED,
this means that that algorithm has to understand at least
UTF-8 and UTF-16 as input encodings. Knowledge of other
encodings is OPTIONAL.

[The disposition of comments says "(these issues can be
better addressed in the C14N spec)", but because this
also affects minimal canonicalization, XPath transform,...,
this is not true.]

6.6.3.2 "The XPath implementation is expected to convert":
'is expected to' is too vague. Please chage this to
'must behave as if it converted....'.

6.6.3.4 'the string converted to UTF-8': Change to 'the string
encoded in UTF-8'. (you can convert from one encoding to
another, but XPath deals with character independent of
an encoding, so convert sounds a bit strange) [two times
in the same paragraph]. A similar wording problem exists
for 'by serializing the node-set with a UTF-8 encoding'.
There is only one UTF-8!

7.1, second list, point 2: 'except... and other character entities
not representable...': This may be wrongly understood to mean that
e.g. &eacute; in a HTML document shouldn't be expanded if
the encoding is US-ASCII. This is of course wrong, &eacute;
should in this case be changed to the appropriate numeric
character reference (and the spec may have to say whether
these should be decimal or hex,...).


I think this list may be longer than you expeced, but
I don't think any of the points is very difficult to fix.


[I have also taken some notes on non-i18n aspects in Core,
mostly wording improvements, which I hope to be able to
send separately later.]


Regards,    Martin.

Received on Tuesday, 27 June 2000 03:23:36 UTC