W3C home > Mailing lists > Public > w3c-ietf-xmldsig@w3.org > October to December 2000

FYI: RE: Character Encoding Question

From: Joseph M. Reagle Jr. <reagle@w3.org>
Date: Tue, 12 Dec 2000 13:46:43 -0500
Message-Id: <4.3.2.7.2.20001212134552.02a390d8@rpcp.mit.edu>
To: "IETF/W3C XML-DSig WG" <w3c-ietf-xmldsig@w3.org>
Older message that wasn't sent to this list, but forwarding it on for issues 
list completeness.

-----Original Message-----
From: Martin J. Duerst [mailto:duerst@w3.org]
Sent: Thursday, November 30, 2000 8:23 AM
To: John Boyer; Tom Gindin
Cc: w3c-archive@w3.org
Subject: RE: Character Encoding Question


At 00/11/29 10:12 -0800, John Boyer wrote:
 >Hi Tom and Martin,
 >
 >Actually, it appears that the info on p. 20 of Unicode Standard 3.0 was
 >slightly misleading.  They talk of using UTF-8 as an encoding format.
 >However, while I think of UTF-8 as encoding all of UCS-4, they appear to be
 >only using UTF-8 to encode the portion of UCS-4 that Unicode represents,
 >which is the 16 x 64k character regions that compose the BMP.
 >
 >So, the prior sentence was still sufficient.  The following would appear to
 >do the trick:
 >
 >"use Normalization Form C [NFC] when converting an XML document to the UCS
 >character domain from an encoding other than UCS-4, UTF-8, UTF-16,
UTF-16BE,
 >or UTF-16LE."

I think you are close with the above, but I think you should change it to

"use Normalization Form C [NFC] when converting an XML document to the UCS
character domain from any encoding that is not UCS-based (currently,
UCS-based
encodings include UTF-8, UTF-16, UTF-16BE, and UTF-16LE, UCS-2, and UCS-4)."

Why my change:

- There are also others in the IANA registry
    (look e.g. for 'unicode' or 'iso10646').
- There are things we know apply but we don't want to mention (UTF-7).
- We don't know what other might come up (hopefully none :-).
- UCS-2 is mentioned because it's not the same as UTF-16.

Please feel free to send this to the involved lists for further
checking.

Regards,   Martin.


 >Note, that UCS-2 and Unicode seem to be equal, and seem to be encoded in
any
 >of the above UTF-16 formats (i.e. UTF-16 *is* Unicode).  This is why I did
 >not mention them in the list.  OK?
 >
 >Thanks again,
 >John Boyer
 >Team Leader, Software Development
 >Distributed Processing and XML
 >PureEdge Solutions Inc.
 >Creating Binding E-Commerce
 >v: 250-479-8334, ext. 143  f: 250-479-3772
 >1-888-517-2675   http://www.PureEdge.com <http://www.pureedge.com/>

__
Joseph Reagle Jr.
W3C Policy Analyst                mailto:reagle@w3.org
IETF/W3C XML-Signature Co-Chair   http://www.w3.org/People/Reagle/
Received on Tuesday, 12 December 2000 13:46:48 GMT

This archive was generated by hypermail 2.2.0 + w3c-0.29 : Thursday, 13 January 2005 12:10:11 GMT