- From: Andrew Fedoniouk <news@terrainformatica.com>
- Date: Fri, 13 Jul 2007 12:00:25 -0700
- To: <public-html@w3.org>, "Sander Tekelenburg" <st@isoc.nl>
----- Original Message ----- From: "Sander Tekelenburg" <st@isoc.nl> To: <public-html@w3.org> Sent: Friday, July 13, 2007 9:22 AM Subject: Re: guessing character encoding (was HTML WG) > > At 08:19 +0300 UTC, on 2007-07-13, Dmitry Turin wrote: > >> Good day, Robert. >> >> RB> I was wondering what character encoding you use to serve up this >> page: >> RB> <http://html60.chat.ru/site/html60/ru/index_ru.htm> >> RB> We're trying to conduct some tests on current UAs and this page might >> RB> be helpful. Do you know what charset it uses? >> >> All pages in russian language are coded in WIN-1251. >> These documents are displayed truely both in IE and Opera. > > Only because they happen to guess what you intend. They're not presented > as > you intend in iCab3.0.3, Firefox2.0.0.4, Safari2.0.4 (because neither the > server nor the document itself say what character repertoire the document > is > in). > > Is there any particular reason why you're relying on UAs to guess what > character repertoire the document is in? (I believe HTML5 aims to define a > perfect guessing algorithm, but AFAIK the idea is 'just' to unify UA > behaviour. I don't believe the intention is that authors rely on that -- > they're still expected to provide the proper Content-Type header, or a > <meta > charset="value">: > <http://www.whatwg.org/specs/web-apps/current-work/multipage/section-document.html#charset0>) > > Now I'm aware that apparently there is some practical problem with > authoring > cyrillic, in that 4 or 5 different encodings are commonly used. Russian > Apache deals with that through content-negotiation: > <http://apache.lexa.ru/english/>. But I see no reason for authors to rely > on > UAs to just magically guess the correct character repertoire. Or is there? > Sander, that is just a bug. HTML documents in Russian must indicate encoding. This particular page will work in IE and only on Russian version of Windows OS as in case of unknown encoding IE uses current system encoding settings (So called "current ANSI code page"). Andrew Fedoniouk. http://terrainformatica.com
Received on Friday, 13 July 2007 20:01:17 UTC