Draft Comments on Charmod Last Call from Joseph M. Reagle Jr. on 2001-02-15 (w3c-ietf-xmldsig@w3.org from January to March 2001)

From: Joseph M. Reagle Jr. <reagle@w3.org>
Date: Wed, 14 Feb 2001 19:45:33 -0500
To: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Cc: "Martin J. Duerst" <duerst@w3.org>, "John Boyer" <jboyer@PureEdge.com>
Message-Id: <4.3.2.7.2.20010214184745.00b18f08@rpcp.mit.edu>
Here are my comments that can be combined with others, if made, before 
forwarding them on to the I18N groups.

__

http://www.w3.org/TR/2001/WD-charmod-20010126/

I'm very glad to see this specification advanced as it is a very useful 
reference -- and educational tool for myself at least. One would think 
representing characters is easy, though it's tricky! Consequently, my 
comments are mostly editorial and relate to any confusions I experienced as 
a reader and could easily be remedied. A few references are made with 
respect to sections that realte to XML Signature, but these issues have been 
largely addressed by the last call of the XML Signature WG's documents: Core 
and Canonical XML.

>1.1 Goals and Scope
>    All W3C specifications have to conform to this document (see section
>    [57]2 Conformance). Authors of other specifications (for example, IETF
>    specifications) are strongly encouraged to take guidance from it.

As an aside, while I strongly support this goal, this sort of requirement is 
atypical and maybe should sit somewhere else in part of the W3C 
process/guide which is capable of enforcing it?


>3.1.2 Units of a Writing System, and Units of Aural Rendering

Please define phoneme, (as distinct from meaning), and syllabaries.


>3.1.3 Units of Visual Rendering
>[Unicode] requires that characters are stored and interchanged in logical 
>order.

Please define "logical order" (or cite definition).


>3.1.5 Units of Collation
>Software developers MUST NOT merely use a one-to-one mapping as their 
>string-compare function, as in sorting operations.

What are you suggesting they do? Relying upon human context to determine 
order seems rather haphazard. For instance, how do you sort the words in an 
English document which contains excerpts from a Spanish document containing 
sequences such as "ch" and "ll" which are considered atomic collation units 
in their native document, but not the document in which they are in?


>3.2 Digital Representation of Characters
>3. To enable use in computers, a suitable base datatype is identified (such 
>as a byte, a 16-bit wyde or other) and a character encoding form (CEF) is 
>used, which encodes the abstract integers of a CCS into sequences of the 
>code units of the base datatype.

Note "wyde" typo. Much of this summary is fairly easy to understand and is 
demonstrated in Appendix A. However, the distinction between CEF and CES is 
not very clear and might merit an example -- if it can be done simply, 
getting in to endian and BOM might confuse the case...


>3.6.1 Character Encoding Identification
>Because of the layered Web architecture (e.g. formats used over protocols), 
>there may be multiple and at times conflicting information about character 
>encoding. Specifications MUST define conflict-resolution mechanisms (e.g. 
>priorities) for these cases, and implementers and content developers MUST 
>follow them carefully.

This requirement can be relevant to dsig that there is a type attribute (of 
type URI) that could identify the encoding of an identified resource being 
signed. However, the signature text speaks of dsig types, not MIME types 
though MIME types when represented as a URI could be included:

>http://www.w3.org/TR/2000/CR-xmldsig-core-20001031/#sec-Reference
>4.3.3 The Reference Element
>. The Type attribute facilitates the processing of referenced data. For 
>example, while this specification makes no requirements over external data, 
>an application may wish to signal that the referent is a Manifest.

If someone did use this to describe the MIME type, the dsig spec does not 
address how to resolve conflicting information and leaves it to the 
application.


>4 Early Uniform Normalization
>4.1 Motivation
>This document also specifies that normalization is to be performed early 
>(by the sender) as opposed to late (by the recipient).

Note, the dsig specification RECOMMENDS but does not require the signature 
be in NFC:

>http://www.w3.org/TR/2000/CR-xmldsig-core-20001031/#sec-XML-Canonicalization
>We RECOMMEND that signature applications create XML content (Signature 
>elements and their descendents/content) in Normalization Form C [NFC] and 
>check that any XML being consumed is in that form as well (if not, 
>signatures may consequently fail to validate).



>4.3 Responsibility for Normalization
>Note: The prohibition of normalization by recipients is necessary for 
>consistency, on which security depends.

DSIG is compliant with this:

>http://www.w3.org/TR/2000/CR-xmldsig-core-20001031/#sec-See
>8.1.3 "See" What is Signed
>Consequently, while we RECOMMEND all documents operated upon and generated 
>by signature applications be in [NFC] (otherwise intermediate processors 
>might unintentionally break the signature) encoding normalizations SHOULD 
>NOT be done as part of a signature transform, or (to state it another way) 
>if normalization does occur, the application SHOULD always "see" (operate 
>over) the normalized form.


>8 Character Encoding in URI References
>This chapter defines how to address this issue in W3C specifications in a 
>way consistent with the model defined in this document and with deployed 
>practice.

DSIG is compliant with this, see:
>http://www.w3.org/TR/2000/CR-xmldsig-core-20001031/#sec-URI





__
Joseph Reagle Jr.                 http://www.w3.org/People/Reagle/
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/Signature
W3C XML Encryption Chair          http://www.w3.org/Encryption/2001/
Received on Wednesday, 14 February 2001 19:45:41 UTC