- From: Anne van Kesteren <annevk@opera.com>
- Date: Wed, 28 Dec 2011 17:11:01 +0100
On Wed, 28 Dec 2011 12:31:12 +0100, Leif Halvard Silli <xn--mlform-iua at m?lform.no> wrote: > Anne van Kesteren Wed Dec 28 01:05:48 PST 2011: >> On Wed, 28 Dec 2011 03:20:26 +0100, Leif Halvard Silli wrote: >>> By "default" you supposedly mean "default, before error >>> handling/heuristic detection". Relevance: On the "real" Web, no browser >>> fails to display utf-16 as often as Webkit - its defaulting behavior >>> not withstanding - it can't be a goal to replicate that, for instance. >> >> Do you mean heuristics when it comes to the decoding layer? Or before >> that? I do think any heuristics ought to be defined. > > Meant: While UAs may prepare for little-endian when seeing the 'utf-16' > label, they should also be prepared for detecting it as big-endian. > > As for Mozilla, if HTTP content-type says 'utf-16', then it is prepared > to handle BOM-less little-endian as well as bom-less big-endian. > Whereas if you send 'utf-16le' via HTTP, then it only accepts > 'utf-16le'. The same also goes for Opera. But not for Webkit and IE. Right. I think we should do it like Trident. >>>> utf-16le becomes a label for utf-16. >>> >>> * Logically, utf-16be should become a label for utf-16 then, as well. >> >> That's not logical. > > Care to elaborate? > > To not make 'utf-16be' a de-facto label for 'utf-16', only makes sense > if you plan to make it non-conforming to send files with the 'utf-16' > label unless they are little-endian encoded. I personally think everything but UTF-8 should be non-conforming, because of the large number of gotchas embedded in the platform if you don't use UTF-8. Anyway, it's not logical because I suggested to follow Trident which has different behavior for utf-16 and utf-16be. > Meaning: The "BOM" should not, for UTF-16be/le, be removed. Thus, if > the ZWNBSP character at the beginning of a 'utf-16be' labelled file is > treated as the BOM, then we do not speak about the 'utf-16be' encoding, > but about a mislabelled 'utf-16' file. I never spoke of any existing standard. The Unicode standard is wrong here for all implementations. >> the first four bytes have special meaning. >> That does not all suggest we should do the same for numerous other >> encodings unrelated to utf-16. > > Why not? I see absolutely no difference here. When would you like to > render a page with a BOM as anything other than what the BOM specifies? Interesting, it does seem like Trident/WebKit look at the specific byte sequences the BOM has in utf-8 and utf-16 before paying attention to the "actual" encoding. -- Anne van Kesteren http://annevankesteren.nl/
Received on Wednesday, 28 December 2011 08:11:01 UTC