W3C home > Mailing lists > Public > public-html@w3.org > July 2007

Re: guessing character encoding (was HTML WG)

From: Andrew Fedoniouk <news@terrainformatica.com>
Date: Fri, 13 Jul 2007 12:00:25 -0700
Message-ID: <000a01c7c588$85442bd0$f502000a@internal.toppro.net>
To: <public-html@w3.org>, "Sander Tekelenburg" <st@isoc.nl>

----- Original Message ----- 
From: "Sander Tekelenburg" <st@isoc.nl>
To: <public-html@w3.org>
Sent: Friday, July 13, 2007 9:22 AM
Subject: Re: guessing character encoding (was HTML WG)

> At 08:19 +0300 UTC, on 2007-07-13, Dmitry Turin wrote:
>> Good day, Robert.
>> RB> I was wondering what character encoding you use to serve up this 
>> page:
>> RB> <http://html60.chat.ru/site/html60/ru/index_ru.htm>
>> RB> We're trying to conduct some tests on current UAs and this page might
>> RB> be helpful. Do you know what charset it uses?
>> All pages in russian language are coded in WIN-1251.
>> These documents are displayed truely both in IE and Opera.
> Only because they happen to guess what you intend. They're not presented 
> as
> you intend in iCab3.0.3, Firefox2.0.0.4, Safari2.0.4 (because neither the
> server nor the document itself say what character repertoire the document 
> is
> in).
> Is there any particular reason why you're relying on UAs to guess what
> character repertoire the document is in? (I believe HTML5 aims to define a
> perfect guessing algorithm, but  AFAIK the idea is 'just' to unify UA
> behaviour. I don't believe the intention is that authors rely on that --
> they're still expected to provide the proper Content-Type header, or a 
> <meta
> charset="value">:
> <http://www.whatwg.org/specs/web-apps/current-work/multipage/section-document.html#charset0>)
> Now I'm aware that apparently there is some practical problem with 
> authoring
> cyrillic, in that 4 or 5 different encodings are commonly used. Russian
> Apache deals with that through content-negotiation:
> <http://apache.lexa.ru/english/>. But I see no reason for authors to rely 
> on
> UAs to just magically guess the correct character repertoire. Or is there?

Sander, that is just a bug.

HTML documents in Russian must indicate encoding.
This particular page will work in IE and only on Russian version
of Windows OS as in case of unknown encoding IE uses current
system encoding settings (So called "current ANSI code page").

Andrew Fedoniouk.
Received on Friday, 13 July 2007 20:01:17 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:24 UTC