Re: The problem with the encoding koi8-r from Jukka K. Korpela on 2013-12-06 (www-validator@w3.org from December 2013)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 06 Dec 2013 17:50:10 +0200
To: "www-validator@w3.org" <www-validator@w3.org>, NIKATOURCo.Ltd.info@nika-tour.org
Message-ID: <52A1F232.5000100@cs.tut.fi>

2013-12-06 17:00, Michael[tm] Smith wrote:

> "Jukka K. Korpela" <jkorpela@cs.tut.fi>, 2013-12-03 00:42 +0200:
>
>> 2013-12-01 14:10, NIKA TOUR Co. Ltd. wrote:
>>> Here is a link - /http://nika-tour.org/Excursions/ysupov_palace_ru.html
>>
>> It is a windows-1251 encoded page, properly declared as windows-1251 when
>> viewed in a browser.
>>
>> But it seems that the server has been (mis)configured to declare koi8-r when
>> requested by the validator. This is something that you need to take to your
>> server admin.
>
> I get "Content-Type: text/html; charset=koi8-r" for it in Firefox and
> Chromium. If you're seeing windows-1251 in the header, I wonder whether it
> might be trying to send something different based on user locale setting or
> IP address. Or something.

Indeed; the HTTP headers also say
Vary: accept-charset, user-agent
Using http://web-sniffer.net with different settings for "User agent", I 
get both windows-1252 and koi8-r results.

I suppose the problem has been mostly fixed now; the original poster 
sent be personal mail, saying "I've already solved that problem." I 
guess the problem was that the server did not consistently send the data 
in the declared encoding.

It isn't quite consistent even now, since when the response has koi8-r 
declared and used in content, the content still has <meta 
charset="windows-1251">. The reason why this mostly works is that the 
<meta ...> tag is ignored when the encoding is declared in an HTTP 
header. Except that if a user saves a page locally, then later opens it 
from disk, it will be displayed wrongly, because there won't be any HTTP 
headers.

Yucca

P.S. The validator's warning "Legacy encoding koi8-r used. Documents 
should use UTF-8" might generally be regarded as excessive UTF-8 
evangelism, but here it might be useful. It's difficult to imagine a 
reason to use varyingly windows-1251 and koi8-r (is there a client that 
knows one of them but not the other?), apparently based on User-Agent 
string rather than Accept-Charset. Using just either of them in all 
responses should work fine. The benefits of UTF-8 are not very tangible 
here, and there is the obvious problem of data size increase (two bytes 
per each Cyrillic letter vs. one byte).

Received on Friday, 6 December 2013 15:50:43 UTC