Re: I18N WG/IG last call comments

Martin, this is my attempt to definitively respond to the last of your
issues.

At 12:27 00/03/25 +0900, Martin J. Duerst wrote:
 >Character encoding and transcoding
 >----------------------------------
 >[Transcoding is the conversion from one character encoding
 >(charset) to another.]
 >
 >- 'minimal' canonicalization is required, but it should be made
 >   very clear that this does not imply that conversion from all
 >   'charset's to UTF-8 is required. A set of 'charset's for which
 >   support is required should be defined exactly, e.g. as UTF-8
 >   and UTF-16. This is the same for other transforms.
 >
 >- There should be a clear and strong warning and an official
 >   non-guarantee in the spec about conversions from legacy
 >   'charset's (i.e. encodings not based on the UCS) into encodings
 >   based on the UCS (i.e. UTF-8, UTF-16,...). The problem is
 >   that while for most 'charset's, all transcoder implementations
 >   produce the same result for almost all characters in these
 >   'charset's, the number of 'charset's where all transcoding
 >   implementations behave exactly the same is rather limited.
 >   As an example, in "Shift_JIS", the 'charset' used on Japanese PCs,
 >   about 10 to 15 special characters are transcoded differently
 >   on Windows, on the Mac, and by Java.
 >
 >- Part of the above warning should be to recommend UTF-8 and UTF-16
 >   to be used for the original documents, and to recommend to use
 >   numeric character references (e.g. ꯍ) for cases where
 >   differences are known.

The specification does not contain either of these recommendations yet.
Absent specific text that could be proposed in our specification, I (as one
of the editors) don't feel I have the expertise to treat this topic with the
appropriate breadth and depth (and no proposals have been forthcoming on the
list) that you have -- beyond what I've already responded to in previous
emails and what I speak of below. Also, I wonder to what extent is it
optimum to write a NFC verification algorithm (or algorithm ID) within the
Signature specification. Much of this is independent of Signature and common
to all XML, we are just one of the first consumer that are affected by this.
Consequently, I hope our response has been sufficient and I expect we will
be making further improvements as time permits and we encounter difficulties
in achieving interoperable implementations, but I also hope to see these
issues more generally addressed as part of [1,2].

[1] http://www.unicode.org/unicode/reports/tr15/ 
[2] http://www.w3.org/TR/charmod/

 >However, thanks to one of the members of our WG, Masahiro Sekiguchi,
 >we have discovered the following security problems:
 >
 >- If NFC is applied before digesting, this gives a higher chance
 >   that an attacker may find a document with the same digest by
 >   chance, simply because a larger number of documents is ultimately
 >   mapped to the same digest. While this potential should be mentioned
 >   in the security section of your document, the low frequency
 >   of accented and other relevant characters and the irregularity
 >   of the transforms seem to make this a rather minor problem.
 >
 >- Assume that a document contains XML with element names with
 >   accented characters. Assume that this document is correctly
 >   normalized. Assume that the signature includes NFC as a transform.
 >   Now the following attack is possible: An intruder replaces the
 >   normalized document by a document with some of the element
 >   names unnormalized. The signature still works. However, an
 >   XML/DOM processor or an XPath expression may (and in practice
 >   will) work differently, because the unnormalized element is
 >   assumed to be different from the normalized one.

As we are not specifying this transform yet, we don't make this comment.
However, we do point out that ANY transform has the potential to be
dangerous.
 
 >The above considerations have led us to the conclusions below.
 >We hope that you can help us with your experience on security
 >issues to check them.
 >
 >- Make sure that the transforms defined do not *require* NFC.
 >   (any XML processor, and any transforms based on it, are
 >    always *allowed* to to do normalization, see the definition
 >    of 'match' in the XML spec).
 >
 >- Provide a transform that *checks* for NFC. The transform
 >   fails if the input is not in NFC. If the transform succeds,
 >   the output is exactly the same as the input. [this is a bit
 >   different from the average transform, but fits very well
 >   into the general model].
 >
 >- Advise users (and provide examples) to use this transform
 >   after e.g. Canonical XML, to avoid missing interactions
 >   between numeric character references and NFC.
 >
 >- Advise users to provide their data in NFC (just to reinforce
 >   the general recommendation), and stress the importance
 >   of this in the context of digital signatures.
 >
 >- Provide a transform that actually does NFC, for cases
 >   where this is desirable, but add the necessary warnings.

Text in 7.0 will now read:

Any canonicalization algorithm should yield output in a specific fixed coded
character set. For both the minimal canonicalization defined in this
specification, the W3C Canonical XML [XML-C14N], and the 2000 Canonical XML
[XML-C14N-a], that coded character set is UTF-8. Additionally, none of these
algorithms provide data type normalization. Applications that normalize data
types in varying formats (e.g., (true, false) or (1,0)) may not be able to
validate each other's signatures. Neither the minimal canonicalization nor
the 2000 Canonical XML [XML-C14N-a] algorithms provide character
normalization. We RECOMMEND that signature applications produce XML content
in Normalized Form C [NFC] and check that any XML being consumed is in that
form as well (if not, signatures may consequently fail to validate).

Again, thank you for the I18N comments!


_________________________________________________________
Joseph Reagle Jr.   
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/People/Reagle/

Received on Wednesday, 31 May 2000 17:18:31 UTC