- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 28 Dec 2011 10:05:48 +0100
On Wed, 28 Dec 2011 03:20:26 +0100, Leif Halvard Silli <xn--mlform-iua at m?lform.no> wrote: > By "default" you supposedly mean "default, before error > handling/heuristic detection". Relevance: On the "real" Web, no browser > fails to display utf-16 as often as Webkit - its defaulting behavior > not withstanding - it can't be a goal to replicate that, for instance. Do you mean heuristics when it comes to the decoding layer? Or before that? I do think any heuristics ought to be defined. >> utf-16le becomes a label for utf-16. > > * Logically, utf-16be should become a label for utf-16 then, as well. That's not logical. > Is that what you suggest? Because, if the BOM can change the meaning of > utf-16be, then it makes sense to treat the utf-16be label as well as > the utf-16le label as synonymous with utf-16. (Thus, effectively > utf-16le and utf-16be becomes defunct/unreliable on the Web.) No, because utf-16be actually has different behavior in absence of a BOM. It does mean they can share some common algorithm(s), but they have to stay different encodings. > SECONDLY: You effectively say that, for the UTF-16 BOM, then the BOM > should override the HTTP level charset info. OK. But then you should go > the full way, and give the BOM the same, overriding authority when it > comes to the UTF-8 BOM. For instance, if the HTTP server's Content-Type > header specifies ISO-8859-1 (or 'utf-8' or 'utf-16'), but the file > itself contains a BOM (that contradicts the HTTP info), then the BOM > "wins" - in IE and WEbkit. (And, btw, w.r.t. IE, then the > X-Content-Type: header has no effect w.r.t. treating the HTTP's charset > info as authoritative - the BOM wins even then.) No, I don't see why we have to go there at all. All this suggests is that within the two utf-16 encodings the first four bytes have special meaning. That does not all suggest we should do the same for numerous other encodings unrelated to utf-16. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 28 December 2011 01:05:48 UTC