- From: <Misha.Wolf@reuters.com>
- Date: Wed, 17 Apr 2002 22:16:57 +0100
- To: Francois Yergeau <FYergeau@alis.com>
- Cc: ietf-charsets@iana.org
On 17/04/2002 21:51:19 Francois Yergeau wrote: > Martin Duerst wrote: [...] > > 5. Byte order mark (BOM) > > > > This section needs more work. The 'change log' says that it's > > mostly taken from the UTF-16 RFC. But the BOM for UTF-8 is > > much less necessary, and much more of a problem, than for UTF-16. > > We should clearly say that with IETF protocols, character encodings > > are always either labeled or fixed, and therefore the BOM SHOULD > > (and MUST at least for small segments) never be used for UTF-8. > > And we should clearly give the main argument, namely that it > > breaks US-ASCII compatibility (US-ASCII encoded as UTF-8 > > (without a BOM) stays exactly the same, but US-ASCII encoded > > as UTF-8 with a BOM is different). > > I don't quite see your point. A US-ASCII string, with or without a BOM, is > always a valid UTF-8 string, I don't see where compatibility is broken. I > can see that protocols shouldn't *require* a BOM, because then a strict > (BOM-less) ASCII string wouldn't meet the requirement. But that's not what > you're saying, right? The point Martin may be making is that some tools insert a BOM at the start of a resource they consider to be encoded using UTF-8, but do not do so for a resource they consider to be encoded using US-ASCII. I have just carried out the following test. I opened Notepad under Win2K and typed the letter "a". I then saved the file, leaving the default encoding of "ANSI". I then saved the file again, under a different name, specifying "UTF-8" as the encoding. I then checked the file sizes using Properties. The first file is 1 byte long; the second 4 bytes. Misha [...] ------------------------------------------------------------- --- Visit our Internet site at http://www.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Wednesday, 17 April 2002 17:14:15 UTC