Re: Re: Guessing the fallback encoding from the top-level domain name before trying to guess from the browser localization from John Cowan on 2014-02-24 (www-international@w3.org from January to March 2014)

From: John Cowan <cowan@mercury.ccil.org>
Date: Mon, 24 Feb 2014 14:31:50 -0500
To: Henri Sivonen <hsivonen@hsivonen.fi>
Cc: "www-international@w3.org" <www-international@w3.org>
Message-ID: <20140224193150.GA25075@mercury.ccil.org>

John Cowan scripsit:

> That works only if you can get 100% of the documents on the legacy Web
> correctly labeled or correctly guessable.  Until recently, there was a
> spectacular bug in Chrome whereby any table-of-contents page generated by
> latex2html, e.g. <http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-2.html>,
> would show up as Chinese mojibake.  Without the ability to change the
> interpretation of the encoding, the only alternative would have been to
> load another browser just to read those pages (I used IETab on that site
> for a while).

I found a page with the same or a similar problem just today:
<http://www.gnu.org/software/freefont/coverage.html> is pure ASCII
and lacks a Content-Encoding: header, yet Chrome renders it as
mojibake if you set the encoding to 8859-1 before loading the page.
It does display correctly if you then change the encoding to UTF-8.
That shouldn't happen.

-- 
One art / There is                      John Cowan <cowan@ccil.org>
No less / No more                       http://www.ccil.org/~cowan
All things / To do
With sparks / Galore                     --Douglas Hofstadter

Received on Monday, 24 February 2014 19:32:14 UTC