W3C home > Mailing lists > Public > ietf-charsets@w3.org > October to December 2002

Re: Comments on draft-yergeau-rfc2279bis-00.txt

From: Markus Scherer <markus.scherer@jtcsv.com>
Date: Thu, 17 Oct 2002 09:12:09 -0700
To: charsets <ietf-charsets@iana.org>
Message-id: <3DAEE159.2010508@jtcsv.com>

Patrik Fältström wrote:

> What I hear on this list is that the consensus is that BOM SHOULD NOT be 
> used. I would like it to be MUST NOT be used in Internet protocols, 
> which leads to tagged UTF-8 text be illegal if the BOM exists in the text.


That would violate the Unicode standard. If UTF-8 is clearly indicated with some charset label, then an initial sequence of ef bb bf must be interpreted as the character U+feff ZWNBSP. Since that is not a very useful character at the beginning of a text, it can usually be ignored.

Personally, I find François' text very clear. It acknowledges existing, reasonable and useful practice.

Best regards,
markus


-- 
Opinions expressed here may not reflect my company's positions unless otherwise noted.
Received on Thursday, 17 October 2002 12:14:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 5 June 2006 15:10:54 GMT