- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 22 Dec 2011 09:59:43 +0100
Henri Sivonen on Tue Dec 20 01:13:45 PST 2011: > On Mon, Dec 19, 2011 at 9:44 PM, L. David Baron wrote: >>> > I discovered that "UNICODE" is >>> > used as alias for "UTF-16" in IE and Webkit. >>> ... >>> > Seemingly, this has not affected Firefox users too much. >>> >>> It surprises me greatly that Gecko doesn't treat "unicode" as an alias >>> for "utf-16". >> >> Why? > > From playing with IE, I thought it was known that "unicode" is an > alias for "utf-16" and it had never occurred to me to check if that > was true in Gecko. MS 'unicode' is only to a 50% degree (sic) an alias for 'utf-16', namely for the *little-endian* "half" of *UTF-16*. (Thus: It is not UTF-16LE, since MS 'unicode' usually includes the BOM.) There is also MS 'unicodeFFFE' that represents big-endian UTF-16. See: http://mail.apps.ietf.org/ietf/charsets/msg02030.html >>?If it's not needed, why shouldn't WebKit and IE drop it? Actually, UTF-16 fails in Webkit much, much more often than in any other browser. E.g. this page is (not that it related, though) labelled as MS 'unicode': http://sacredheartbayhead.com/. Firefox, Opera and IE all display it. But Chrome/Safari fails to detect the encoding. So despite that Webkit aligns with IE by understanding MS 'unicode' and MS 'unicodeFFFE', it does other things wrong when it comes to UTF-16. So, you should only look at Webkit if you want to see how well a browser can do in the market when it has below average UTF-16 support ... (Chrome is may be a better than Safari, though - Chrome at least allows me to *select* UTF-16, whereas Safari does not offer UTF-16 in its encoding menu.. Chrome also uses character set detection more actively.) > Needed is relative. So far, I haven't seen data about how much > existing content there is out there that depends on this. It could be > that some users somewhere have rejected Firefox or Opera for this and > there just isn't enough of a feedback loop. Feedback loop for you: In UTF-16LE or UTF-16BE pages without any other encoding info. (The HTML5 encoding sniffing tells UAs to *do* read the meta @charset *if* all other tests fails.) And, voila, I just now found one such page: <http://www.hughesrenier.be/actualites.html>. This page works fine in IE - and IE only. (That it fails in Webkit is because of some bug in its encoding sniffing - see below.) Offline, on my computer, when I switched the value of the meta @charset for that page to 'UTF-16', then Firefox and Opera would also pick up the encoding. Other pages of the same kind: <http://www.sunsetridgebusinesspark.com/BusinessListing.html> <http://www.rpmcmillen.com/taxes.html> <http://www.hughesrenier.be/illustration.html> <http://memphismitchellathletics.com/pages/2010football.html> There are also pages like these, which works fine in IE, but which in Firefox, if I manually select UTF-16, displays broken-character-signs - I don't know if the UTF-16 code is buggy?: <http://www.casamobile.org/BoardMembersStaff.html> <http://comfortablerentals.com/Our%20Services.html> <http://lergp.cce.cornell.edu/IPM/Home.htm> <http://www.belpaese2000.narod.ru/Teca/Nove/Deledda/nov/regina.htm> <http://www.belpaese2000.narod.ru/Teca/Nove/Deledda/nov/macchie.htm> <http://web.tiscali.it/marcokiller/Mappa_del_sito.htm> <http://familienlundorff.dk/familienLundorff.dk/genealogi/Andreas_1769/Niels_1813_Johanne_1854.html> <http://www.prcflow.com/orifice_meter_runs_plates.htm> <http://healthactioncenter.com/aboutus.htm> <http://www.belpaese2000.narod.ru/Teca/Nove/Deledda/nov/mago.htm> <http://www.trascaucristian.3x.ro/> (shows BOM sign) <http://www.casamobile.org/history.html> <http://www.hawkpages.com/> (See 'embedded' code on right page side) I found them via Google, which for certain UTF-16 pages renders the source code as search result (which make Google Search very similar to how Webkit handles UTF-16, btw): <http://www.google.com/search?q=%22%3Cmeta+content%3D%27text/html%3B+charset%3Dunicode%27%22> Not the same thing, but speaking about necessity: This page declares "UTF-8" 3 times plus that it includes the BOM. However, the HTTP charset says ISO-8859-1, and hence ... the page fails in Firefox and Opera, but not in Webkit and IE: <http://www.bozze.1.vg/>. > Maybe it isn't needed, but it seems that from the WebKit or IE point > of view, the potential upside from dropping this alias is about > non-existent while there could be a downside. I'd expect it to be hard > to get IE and WebKit to drop the alias. Btw, one thing: A big source of Google findings for the search string "<meta content='text/html; charset=unicode'" , are seems to be HTML attachments (from MS Word users) in e-mail messages to mailing lists. Example: http://stsk.no/pipermail/drill-aspiranter_stsk.no/attachments/20101230/8335fbe4/attachment-0001.html -- Leif Halvard Silli
Received on Thursday, 22 December 2011 00:59:43 UTC