W3C home > Mailing lists > Public > ietf-charsets@w3.org > April to June 1999

Re: Fwd: I-D ACTION:draft-hoffman-utf16-03.txt

From: Paul Hoffman / IMC <phoffman@imc.org>
Date: Sun, 02 May 1999 15:19:07 -0700
To: "Martin J. Duerst" <duerst@w3.org>, ietf-charsets@iana.org
Message-id: <4.2.0.37.19990502150706.01d5b3c0@mail.imc.org>
At 02:36 PM 4/28/99 +0900, Martin J. Duerst wrote:
>Of course, I have a few more nits, you would have guessed it, wouldn't you?

Hard not to guess it. I agree with the nits, but not with some of the more 
substantial changes. I've already started a -04 draft, so please feel free 
to comment on my comments. (Also, are the rest of the folks on the list 
still reading the drafts?...)

>My actual comments:
>
>1.1 Background...:
>   - Remove first comma in first sentence
>
>just before 2.1 Encoding of UTF-16:
>   - Add: Note: Values between 0xD800 and 0xDFFF are specifically reserved
>     for use with UTF-16, and don't have any characters assigned to them.
>     [I think this helps understanding]
>
>3. Labeling...:
>   - Expand the acronyms CCS and CES, e.g. CCS (Coded Character Set)
>
>   - contains registration -> contains registrations
>
>   - put in a cross-reference to Appendix A for the registrations
>
>just before 3.2:
>   - "it is likely that little-endian order will also be used"
>     it is already used, both in general and for UTF-16. Change to
>     "little-endian order is also [[sometimes] used | in use] on
>     the Internet".

All of these are fine, and I've made the changes in -04.

>just before 3.3:
>   - some specifications mandate: This is worded in a general way,
>     but does not cover other possibilities (e.g. that the BOM is
>     needed and a part of the object,...). I propose to change this
>     to:
>     Some specifications, e.g. for mime content types, may mandate
>     a particular treatment of the BOM, i.e. they might require that
>     an object starts with an 0xFEFF, which is not part of the
>     object itself. Such provisions create undesired interdependencies
>     between the character encoding/transport layer and the encoding
>     of the object itself, and should therefore be avoided wherever
>     possible.

We kept the wording general on purpose, and I think that your replacement 
text does not accurately reflect the situation. The sentence is about the 
XML requirement for the BOM: it has nothing to do with MIME types. Also, I 
think our wording makes more sense than yours with respect to where the BOM 
resides. You have it as "object starts with an 0xFEFF, which is not part of 
the object itself", which doesn't make sense to me.

>just before 3.4:
>   - An (unfortunate) exception ...: This is again too specific.
>     I propose to replace it with:
>     In cases where higher-level specifications, e.g. for mime content
>     types, mandate a particular treatment of endianness and the BOM,
>     only the appropriate labels MUST be used. As an example, if a
>     specification requires an object to start with a BOM to identify
>     endianness, only the "UTF-16" tag must be used.

Again, this has nothing to do with MIME types. I also don't think your 
additional wording doesn't  help an implementor.

>5. Example:
>
>   - please change 0x00012345 to 0x12345 (and ideally likewise remove
>     all leading zeros in all examples refering to character values
>     (as opposed to byte/16-bit/... values)).

I'll disagree here, although it's only aesthetic. I like always having 
values with a multiple of two octets to make them easier to read. I think 
visually parsing "0x12345" is harder than parsing "0x00012345".

>References:
>
>   - [Unicode]: What you cite is just a "patch". Please include
>     Unicode 2.0 (the book) in the citation.

I'll defer to the Unicode people on this (and bring it up on their mailing 
list). I'm pretty sure that this is the way they wanted us to cite the 
work. That is, Unicode Technical Report #8 starts off with the sentence 
"This report documents the Unicode Standard, Version 2.1."

>Registrations:
>
>   - Suitable for use ... (three times): Please add something like
>     (except for HTTP) to help people understand what it exactly means.

The wording for those sections in the registration were given to me by Ned 
Freed. Ned: how do you feel about the wording for this in the 
registrations? We already cover the HTTP exception earlier in the appendix.

--Paul Hoffman, Director
--Internet Mail Consortium
Received on Sunday, 2 May 1999 18:24:29 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:51 GMT