RE: Character Encoding Question

At 12:18 PM -0800 11/29/00, John Boyer wrote:
>Still your question is valid because UCS-4 contains code points 
>outside of the BMP, and UTF-8 is capable of encoding them, while 
>Unicode/UCS-2/UTF-16x is not.

That is an incorrect statement. UTF-16 is able to encode things 
outside the BMP just fine, the the method for doing so is specified 
in the Unicode Standard an in RFC 2781.

>   While nothing currently exists out there,

This is also not true: there are private use areas allocated in 
planes 15 and 16.

>  I think ISO/IEC 10646-2 is supposed to change that fact, so it 
>would be helpful for us to change our sentence about the conditions 
>under which we expect the application of Normalization Form C to 
>occur.

This all started with a statement:

"REQUIRED to use Normalization Form C [NFC] when converting an XML 
document to the UCS character domain from a non-Unicode encoding".

This was a bit of shorthand on the part of whoever wrote it. Simply 
change "a non-Unicode encoding" to "any non-UCS encoding" or "any 
local encoding".


>In conclusion, it would be helpful to know whether anyone thinks 
>UTF-7 
>(<http://www.ietf.org/rfc/rfc2152.txt>http://www.ietf.org/rfc/rfc2152.txt) 
>should be included since it does claim to be a format for encoding 
>Unicode characters.

Oh God no. UTF-7 was a mistake and has, thankfully, never been widely 
adopted. The only real use of UTF-7 is in IMAP and everyone there 
deeply regrets it. Pretend that you never heard of UTF-7.

--Paul Hoffman, Director
--Internet Mail Consortium

Received on Wednesday, 29 November 2000 18:15:28 UTC