FYI: RE: Character Encoding Question

Older message that wasn't sent to this list, but forwarding it on for issues 
list completeness.

-----Original Message-----
From: Martin J. Duerst []
Sent: Thursday, November 30, 2000 8:23 AM
To: John Boyer; Tom Gindin
Subject: RE: Character Encoding Question

At 00/11/29 10:12 -0800, John Boyer wrote:
 >Hi Tom and Martin,
 >Actually, it appears that the info on p. 20 of Unicode Standard 3.0 was
 >slightly misleading.  They talk of using UTF-8 as an encoding format.
 >However, while I think of UTF-8 as encoding all of UCS-4, they appear to be
 >only using UTF-8 to encode the portion of UCS-4 that Unicode represents,
 >which is the 16 x 64k character regions that compose the BMP.
 >So, the prior sentence was still sufficient.  The following would appear to
 >do the trick:
 >"use Normalization Form C [NFC] when converting an XML document to the UCS
 >character domain from an encoding other than UCS-4, UTF-8, UTF-16,
 >or UTF-16LE."

I think you are close with the above, but I think you should change it to

"use Normalization Form C [NFC] when converting an XML document to the UCS
character domain from any encoding that is not UCS-based (currently,
encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and UCS-4)."

Why my change:

- There are also others in the IANA registry
    (look e.g. for 'unicode' or 'iso10646').
- There are things we know apply but we don't want to mention (UTF-7).
- We don't know what other might come up (hopefully none :-).
- UCS-2 is mentioned because it's not the same as UTF-16.

Please feel free to send this to the involved lists for further

Regards,   Martin.

 >Note, that UCS-2 and Unicode seem to be equal, and seem to be encoded in
 >of the above UTF-16 formats (i.e. UTF-16 *is* Unicode).  This is why I did
 >not mention them in the list.  OK?
 >Thanks again,
 >John Boyer
 >Team Leader, Software Development
 >Distributed Processing and XML
 >PureEdge Solutions Inc.
 >Creating Binding E-Commerce
 >v: 250-479-8334, ext. 143  f: 250-479-3772
 >1-888-517-2675 <>

Joseph Reagle Jr.
W3C Policy Analyst      
IETF/W3C XML-Signature Co-Chair

Received on Tuesday, 12 December 2000 13:46:48 UTC