W3C home > Mailing lists > Public > www-validator@w3.org > October 2007

Re: Validation of Russian pages

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 26 Oct 2007 07:29:37 +0300
Message-ID: <004401c81788$d55b7130$0400000a@DOCENDO>
To: <www-validator@w3.org>, "Jacob Palme" <jpalme@dsv.su.se>

Jacob Palme wrote:

> The most common character set for Russian web pages is
> "windows-1521" and not "ISO-8859-5".

As you mention in another message, you meant windows-1251. I'm not sure 
whether it's the most common encoding for Russian pages (KOI8-R is quite 
common too), but that's not important right now.

> I would like the
> validator to be able to handle this character set.

As far as I can see, the W3C validator handles it just fine. It's also 
listed in the list of encodings that you see if you use the extended user 
interface that you get by clicking on "More Options" or visiting directly 
the page
http://validator.w3.org/#validate_by_uri+with_options
The drop-down menu "Character Encoding" has windows-1250 through 
windows-1256 at its end. (I'm not sure I understand the logic behind the 
ordering of encodings there. Maybe the alphabetic order would be better?)

Which kind of problem did you encounter when trying to validate a 
windows-1251 encoded page?

(I can see a minor problem, but it's really just a detail in the report: I 
intentionally used a page containing octets that are not defined in 
windows-1251. The validation report ends with the message
The error was: cp1251 "\x98" does not map to Unicode
which refers to windows-1251 by a name other than the one used otherwise by 
the validator. Actually, cp1251 isn't even registered at IANA as an alias, 
though it is commonly used.)

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/ 
Received on Friday, 26 October 2007 04:28:42 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:26 GMT