- From: 신정식 <jshin1987@gmail.com>
- Date: Fri, 20 Dec 2013 12:58:14 -0800
- To: www-international@w3.org
- Message-ID: <CAE1ONj_qGr+6wVSt3JgjpoQeRZv+DVRp+5vVw7D44OUKjSh=YQ@mail.gmail.com>
Sirry I mean to reply to all ---------- Forwarded message ---------- From: "Jungshik SHIN (신정식)" <jshin1987@gmail.com> Date: Dec 19, 2013 2:18 PM Subject: Re: Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization To: "John Cowan" <cowan@mercury.ccil.org> Cc: On Dec 19, 2013 11:16 AM, "John Cowan" <cowan@mercury.ccil.org> wrote: > > Henri Sivonen scripsit: > > > Chrome seems to have content-based detection for a broader range of > > encodings. (Why?) > > Presumably because they believe it improves the user experience; It is off by default. Even when it is on, it is only used in absence of an explicit declaration either via http c-t header or meta tag. It never overides the declared encoding. What Google search does has little to do with what Blink does. > I don't know for sure. What I do know is that Google search attempts to > convert every page it spiders to UTF-8, and that they rely on encoding > detection rather than (or in addition to) declared encodings. In > particular, certain declared encodings such as US-ASCII, 8859-1, and > Windows-1252, are considered to provide *no* encoding information. > > Before modifying existing encoding-detection schemes, I would ask > someone at Google (or another company that spiders the Web extensively) > to find out just how much superior the revised scheme would be when > applied to the existing Web, rather than trusting to _a priori_ > arguments. > > > * The domain name is a country TLD whose legacy encoding affiliation > > I couldn't figure out: .ba, .cy, .my. (Should .il be here in case > > there's windows-1256 legacy in addition to windows-1255 legacy?) > > 1256 is Arabic, 1255 is Hebrew, so I assume you meant the other way around. > > -- > John Cowan cowan@ccil.org http://ccil.org/~cowan > If he has seen farther than others, > it is because he is standing on a stack of dwarves. > --Mike Champion, describing Tim Berners-Lee (adapted) >
Received on Friday, 20 December 2013 20:58:44 UTC