- From: Marco Cimarosti <marco.cimarosti@essetre.it>
- Date: Wed, 16 May 2001 11:15:26 +0200
- To: 'Keld Jørn Simonsen' <keld@dkuug.dk>, duerst@w3.org
- Cc: www-international@w3.org, Unicode List <unicode@unicode.org>
Keld Jørn Simonsen wrote: > For UTF-8 there is no need to have a BOM, as there is only one > way of serializing octets in UTF-8. There is no little-endian > or big-endian. A BOM is superfluous and will be ignored. Not so. In plain text, it is a useful signature to distinguish UTF-8 from other things. See the 3rd question in <http://www.unicode.org/unicode/faq/utf_bom.html>. The three bytes EF BB BF is hardly confused with a meaningful sequence in existing encodings. The only (unlikely) example I know is a couple of Hangul syllables in UTF-16. However, as we are talking about text whose encoding is already identified (e-mail, web), it is in fact quite superfluous to have a signature at all. But, then, this is superfluous also for other UTF's: what's the purpose of using an endianness-ambiguous MIME specification (e.g. "UTF-16") and a BOM to disambiguate it? Isn't it simpler to use an unambiguous specification in the first place (e.g. "UTF-16BE" or "UTF-16LE")? BTW, I understand that BOM is just a nickname now: the character has been renamed as "ZERO WIDTH NO-BREAK SPACE". _ Marco
Received on Wednesday, 16 May 2001 05:15:42 UTC