- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 06 Dec 2013 17:50:10 +0200
- To: "www-validator@w3.org" <www-validator@w3.org>, NIKATOURCo.Ltd.info@nika-tour.org
2013-12-06 17:00, Michael[tm] Smith wrote: > "Jukka K. Korpela" <jkorpela@cs.tut.fi>, 2013-12-03 00:42 +0200: > >> 2013-12-01 14:10, NIKA TOUR Co. Ltd. wrote: >>> Here is a link - /http://nika-tour.org/Excursions/ysupov_palace_ru.html >> >> It is a windows-1251 encoded page, properly declared as windows-1251 when >> viewed in a browser. >> >> But it seems that the server has been (mis)configured to declare koi8-r when >> requested by the validator. This is something that you need to take to your >> server admin. > > I get "Content-Type: text/html; charset=koi8-r" for it in Firefox and > Chromium. If you're seeing windows-1251 in the header, I wonder whether it > might be trying to send something different based on user locale setting or > IP address. Or something. Indeed; the HTTP headers also say Vary: accept-charset, user-agent Using http://web-sniffer.net with different settings for "User agent", I get both windows-1252 and koi8-r results. I suppose the problem has been mostly fixed now; the original poster sent be personal mail, saying "I've already solved that problem." I guess the problem was that the server did not consistently send the data in the declared encoding. It isn't quite consistent even now, since when the response has koi8-r declared and used in content, the content still has <meta charset="windows-1251">. The reason why this mostly works is that the <meta ...> tag is ignored when the encoding is declared in an HTTP header. Except that if a user saves a page locally, then later opens it from disk, it will be displayed wrongly, because there won't be any HTTP headers. Yucca P.S. The validator's warning "Legacy encoding koi8-r used. Documents should use UTF-8" might generally be regarded as excessive UTF-8 evangelism, but here it might be useful. It's difficult to imagine a reason to use varyingly windows-1251 and koi8-r (is there a client that knows one of them but not the other?), apparently based on User-Agent string rather than Accept-Charset. Using just either of them in all responses should work fine. The benefits of UTF-8 are not very tangible here, and there is the obvious problem of data size increase (two bytes per each Cyrillic letter vs. one byte).
Received on Friday, 6 December 2013 15:50:43 UTC