- From: Christian Smith <csmith@barebones.com>
- Date: Fri, 30 Jun 2000 10:26:14 -0400
- To: Ian Graham <ian.graham@utoronto.ca>
- cc: www-html@w3.org, Chris Croome <chris@webarchitects.co.uk>
On Friday, June 30, 2000 at 09:29, igraham@smaug.java.utoronto.ca (Ian Graham) wrote: > I think you mean UTF-16 (the two-byte encoding). UTF-8 doesn't use / > require a byte order mark, as all characters are encoded as a stream of > one, two, or more bytes, and the encoding rules uniquely define the > ordering of the bytes (a byte stream). No, I do mean UTF-8. While UTF-8 does not require a BOM (neither does UTF-16) there is a defined BOM for UTF-* and it is convenient to have one. Otherwise it can be dificult to determine that a file is UTF-8 (as opposed to some other binary format) absent some other specific designator. That UTF-8 doesn't have a BOM seems to be a common misconception but the Unicode FAQ is pretty clear on this. http://www.unicode.org/unicode/faq/#BOM Part of the problem is that the RFC for Unicode is almost (but not quite)[1] completely useless and the ISO specification is no better. Neither of these documents can be read and understood by us mere mortals. Of course it is perhaps a bit misleading to call this a BOM (yet that is what it is called) as UTF-8 doesn't have little/big-endian forms so there is no "order" to mark. [1] Is this a TLM? BNQ = "but not quite" or should we have ABNQ = "almost but not quite" ;-? -- Christian Smith | csmith@barebones.com | http://web.barebones.com He who dies with the most friends... Is still dead!
Received on Friday, 30 June 2000 10:26:15 UTC