- From: Geoffrey Sneddon <foolistbar@googlemail.com>
- Date: Fri, 29 Feb 2008 16:09:42 +0000
- To: Ian Hickson <ian@hixie.ch>
- Cc: whatwg@whatwg.org, HTML WG <public-html@w3.org>, public-i18n-core@w3.org
On 29 Feb 2008, at 01:21, Ian Hickson wrote: >> - Again there, shouldn't we be given unicode codepoints for that (as >> it'll be a unicode string)? > > Not sure what you mean. This is just me being incredibly dumb. Ignore it. > On Sat, 26 May 2007, Henri Sivonen wrote: >> >> The draft says: >> "A leading U+FEFF BYTE ORDER MARK (BOM) must be dropped if present." >> >> That's reasonable for UTF-8 when the encoding has been established by >> other means. >> >> However, when the encoding is UTF-16LE or UTF-16BE (i.e. supposed >> to be >> signatureless), do we really want to drop the BOM silently? >> Shouldn't it >> count as a character that is in error? > > Do the UTF-16LE and UTF-16BE specs make a leading BOM an error? > > If yes, then we don't have to say anything, it's already an error. > > If not, what's the advantage of complaining about the BOM in this > case? I don't see anything making a BOM illegal in UTF-16LE/UTF-16BE, in fact, the only mention I find of it with regards to either in Unicode 5.0 is "In UTF-16(BE|LE), an initial byte sequence <(FE FF|FF FE)> is interpreted as U+FEFF zero width no-break space." I suppose the rational given for removing it is the section that follows D101 (e.g., "When converting between different encoding schemes…UTF-8 byte sequences is not recommended by the Unicode Standard."). -- Geoffrey Sneddon <http://gsnedders.com/>
Received on Friday, 29 February 2008 16:09:58 UTC