- From: John Cowan <cowan@mercury.ccil.org>
- Date: Wed, 21 Nov 2012 16:27:27 -0500
- To: Anne van Kesteren <annevk@annevk.nl>
- Cc: www-international@w3.org
Anne van Kesteren scripsit: > * Per my reading of the HTML specification you can use utf-16le and > utf-16be without a BOM. It does not even require it for utf-16, > although I suppose Unicode might (though Unicode is not very correct > here with respect to what implementations do). Per Unicode, in UTF-16LE and UTF-16BE documents, there is no such thing as a BOM. If a UTF-16LE document begins FF FE, that means the first character is U+FEFF, ZERO BASED NON-BREAKING SPACE; likewise if a UTF-16BE document begins FE FF. In the UTF-16 encoding, a leading FF FE or FE FF is a BOM rather than a character, and all following pairs of bytes are interpreted little-endian or big-endian respectively. If the first two bytes are neither of these, a higher-level protocol must decide whether to interpret the pairs of bytes as big- or little-endian. If no higher-level protocol exists, the interpretation is big-endian by default. -- Unless it was by accident that I had John Cowan offended someone, I never apologized. cowan@ccil.org --Quentin Crisp http://www.ccil.org/~cowan
Received on Wednesday, 21 November 2012 21:27:49 UTC