W3C home > Mailing lists > Public > www-international@w3.org > October to December 2013

Fwd: Re: Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization

From: 신정식 <jshin1987@gmail.com>
Date: Fri, 20 Dec 2013 12:58:14 -0800
Message-ID: <CAE1ONj_qGr+6wVSt3JgjpoQeRZv+DVRp+5vVw7D44OUKjSh=YQ@mail.gmail.com>
To: www-international@w3.org
Sirry I mean to reply to all
---------- Forwarded message ----------
From: "Jungshik SHIN (신정식)" <jshin1987@gmail.com>
Date: Dec 19, 2013 2:18 PM
Subject: Re: Guessing the fallback encoding from the top-level domain name
before trying to guess from the browser localization
To: "John Cowan" <cowan@mercury.ccil.org>
Cc:


On Dec 19, 2013 11:16 AM, "John Cowan" <cowan@mercury.ccil.org> wrote:
>
> Henri Sivonen scripsit:
>
> > Chrome seems to have content-based detection for a broader range of
> > encodings. (Why?)
>
> Presumably because they believe it improves the user experience;

It is off by default. Even when it is on, it is only used in absence of an
explicit declaration either via http c-t header or meta tag. It never
overides the declared encoding.

What Google search does has little to do with what Blink does.

> I don't know for sure.  What I do know is that Google search attempts to
> convert every page it spiders to UTF-8, and that they rely on encoding
> detection rather than (or in addition to) declared encodings.  In
> particular, certain declared encodings such as US-ASCII, 8859-1, and
> Windows-1252, are considered to provide *no* encoding information.
>
> Before modifying existing encoding-detection schemes, I would ask
> someone at Google (or another company that spiders the Web extensively)
> to find out just how much superior the revised scheme would be when
> applied to the existing Web, rather than trusting to _a priori_
> arguments.
>
> >  * The domain name is a country TLD whose legacy encoding affiliation
> > I couldn't figure out: .ba, .cy, .my. (Should .il be here in case
> > there's windows-1256 legacy in addition to windows-1255 legacy?)
>
> 1256 is Arabic, 1255 is Hebrew, so I assume you meant the other way
around.
>
> --
> John Cowan  cowan@ccil.org  http://ccil.org/~cowan
> If he has seen farther than others,
>         it is because he is standing on a stack of dwarves.
>                 --Mike Champion, describing Tim Berners-Lee (adapted)
>
Received on Friday, 20 December 2013 20:58:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 21 September 2016 22:37:35 UTC