- From: John Cowan <cowan@mercury.ccil.org>
- Date: Thu, 19 Dec 2013 14:14:41 -0500
- To: Henri Sivonen <hsivonen@hsivonen.fi>
- Cc: www-international@w3.org
Henri Sivonen scripsit: > Chrome seems to have content-based detection for a broader range of > encodings. (Why?) Presumably because they believe it improves the user experience; I don't know for sure. What I do know is that Google search attempts to convert every page it spiders to UTF-8, and that they rely on encoding detection rather than (or in addition to) declared encodings. In particular, certain declared encodings such as US-ASCII, 8859-1, and Windows-1252, are considered to provide *no* encoding information. Before modifying existing encoding-detection schemes, I would ask someone at Google (or another company that spiders the Web extensively) to find out just how much superior the revised scheme would be when applied to the existing Web, rather than trusting to _a priori_ arguments. > * The domain name is a country TLD whose legacy encoding affiliation > I couldn't figure out: .ba, .cy, .my. (Should .il be here in case > there's windows-1256 legacy in addition to windows-1255 legacy?) 1256 is Arabic, 1255 is Hebrew, so I assume you meant the other way around. -- John Cowan cowan@ccil.org http://ccil.org/~cowan If he has seen farther than others, it is because he is standing on a stack of dwarves. --Mike Champion, describing Tim Berners-Lee (adapted)
Received on Thursday, 19 December 2013 19:15:04 UTC