W3C home > Mailing lists > Public > www-validator@w3.org > April 2003

Re: gb2312 support broken?

From: Martin Duerst <duerst@w3.org>
Date: Tue, 29 Apr 2003 11:35:13 -0400
Message-Id: <>
To: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org

It looks like the document isn't correct.

 From http://www.iana.org/assignments/character-sets:
Name: GB2312  (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
         two byte set:
           20-7E = one byte ASCII
           A1-FE = two byte PRC Kanji
         See GB 2312-80
         PCL Symbol Set Id: 18C
Alias: csGB2312

The document contains two-byte sequences where the second
byte is in the 20-7E range. This isn't GB2312 as defined
above. I don't know what it is.

When changing the charset to GB-18030, things worked
(the document was hopelessly invalid, though).

What really should be changed is that on

there is no information at all about the charset, and no
way to change it. There should always be a chance to change
the charset if there is an encoding-related problem.

Regards,    Martin.

At 20:15 03/04/28 +0200, Bjoern Hoehrmann wrote:
>   On [1] the validator tells:
>   Sorry, I am unable to validate this document because on lines 23-26,
>   31-32, 35-36, 39-40, 43 it contained one or more bytes that I cannot
>   interpret as gb2312 (in other words, the bytes found are not valid
>   values in the specified Character Encoding). Please check both the
>   content of the file and the character encoding indication.
>I don't think this is accurate, in fact, all tested browsers render the
>document as expected, so this is likely to be a bug in either the
>validator or iconv().
Received on Tuesday, 29 April 2003 11:38:02 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:37 UTC