Re: Followup on I18N Last Call comments and disposition

Hello Joseph,

I have had a look at the latest draft that you pointed out
to me yesterday. (

Comments below:

At 00/07/12 18:06 -0400, Joseph M. Reagle Jr. wrote:

>I just realized all of the proposals we agreed to in [1], didn't make it
>into the spec that was published yesterday[2], however, their disposition is
>captured as part of the last call issues [3] and will make it in.
>I think if you are happy with this email, most all of the comments for the
>Signature spec will have been addressed. There's one issue Eastlake should
>clarify, and there's 2.5 rows that have a 'C14N?' that pertain specifically
>to C14N (mostly wording) and I will have to check with Boyer to make sure
>they are addressed.
>At 18:44 7/10/00 +0900, Martin J. Duerst wrote:
>  >>Sorry, I still don't understand. The distinction seems immaterial,
>  >>particularly depending on your view of the architecture where the
>  >>engine is seperate from the content-content regardless. The content we are
>  >>talking about is Signatures, not word processing files and such. Signature
>  >>applications create XML Signatures, that's the content they concern
>  >>themselves with.
>  >
>  >Then that should be made clearer.
>Propose: We RECOMMEND that signature applications create XML content
>/+(Signature elements and their descendents/content)+/ in Normalized Form C
>[NFC] and check that any XML being consumed is in that form as well (if not,
>signatures may consequently fail to validate).

That's okay. The changes I would additionally propose for that whole
paragraph is to move the sentences about data type normalization to the end
(so that character encoding and character normalization go together),
and to not say anything about Canonical XML at this point, to avoid
update problems (on the other hand, Canonical XML HAS to say that
there is no BOM at the start).

>  >>  >>  >    It should be mandated that, when a document is transcoded from
>  >>  >>  >    non-Unicode encoding to Unicode as part of C14N, normalization
>  >>must be
>  >>  >>  >    performed (At the bottom of section 2 and also in A.4 in the
>  >>1st
>  >>  >>  >    2000 draft).
>Are you suggesting this text be a part of Signature and C14N documents? For
>the signature specification I've appended a new sentence to the end of the
>proposed text in 6.5:
>Canonicalization algorithms takes two implicit parameter when they appear as
>a CanonicalizationMethod within the SignedInfo element: the content and its
>charset. (Note, there may be ambiguities in converting existing charsets to
>Unicode, for an example see the XML Japanese Profile [XML-Japanese] NOTE.)
>The charset is derived according to the rules of the transport protocols and
>media formats (e.g, RFC2376 [XML-MT] defines the media types for XML). This
>information is necessary to correctly sign and verify documents and often
>requires careful server side configuration. Various canonicalization
>algorithms require conversion to [UTF-8]. Where any such algorithm is
>REQUIRED or RECOMMENDED the algorithm MUST understand at least [UTF-8] and
>[UTF-16] as input encodings. Knowledge of other encodings is OPTIONAL.
>Transcodings from a non-Unicode encoding to Unicode as part of
>canonicalization SHOULD be performed
>I say SHOULD because this part of the specification is generic and we do not
>exclude algorithms from use, only specific required to implement. If someone
>uses some other C14N that doesn't do this, we recommend against it (as does
>the Web Character Model) but it doesn't make sense to consider it Signature

Very good base to work with, but a few proposals:
- 'algorithms takes': remove one s.
- There should be a period at the end :-).
- The (Note) should come later, probably at the end of the paragraph.
- Change 'media formats' to media types, it's more consistent.
- The (currently) last sentence is the wrong way round. It should read
   e.g. 'Canonicalization to NFC [UTR #15] SHOULD be performed as part of
   transcoding from a non-Unicode encoding to Unicode.'
- Your argument about SHOULD above is okay, but for the algorithms
   that are required or recommended, it should be a MUST. I would try
   to reword so that the where-clause of the previous sentence
   applies, and change from SHOULD to MUST. Other algorithms can
   do whatever they want, but I guess a MUST for the req... ones
   will give them enough of a hint to do the same thing unless
   they have very specific reasons not to do so.

>  >>Can you point me to the text (even if a Member URL). You are asking me to
>  >>align our text with something I haven't seen (and consequently don't
>  >>understand why it is necessary.)
>  >
>  >See
>  >Please note that 'crosshatch' should be changed to 'number sign',
>  >because this is the official ISO 10646/Unicode name.
>Ok, text in the Reference section will say:
>... The set of characters for URIs is the same as for XML, namely [Unicode].
>However, some Unicode characters are disallowed from URI references: all
>non-ASCII characters and the excluded characters listed in Section 2.4 of
>[IETF RFC 2396], excluding the number sign (#) and the percent sign (%); and
>excluding the square bracket characters re-allowed in [IETF RFC 2732].
>Disallowed characters must be escaped as described in section 4.1.1 URI
>Reference Encoding and Escaping of [XPtr]:
>1. Each disallowed character is converted to UTF-8 [IETF RFC 2279] as one or
>more bytes.
>2. Any octets corresponding to a disallowed character are escaped with the
>URI escaping mechanism (that is, converted to %HH, where HH is the
>hexadecimal notation of the byte value).
>3. The original character is replaced by the resulting character sequence.

Great, but I suggest you take out the reference to [XPtr].
XPtr is not the definitive reference for this, it's just another
case of specifying the same thing.

Regards,    Martin.

Received on Tuesday, 1 August 2000 04:37:28 UTC