Re: gb2312 support broken? from Martin Duerst on 2003-04-29 (www-validator@w3.org from April 2003)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 29 Apr 2003 11:35:13 -0400
To: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
Message-Id: <4.2.0.58.J.20030429112414.059decd0@localhost>

It looks like the document isn't correct.

 From http://www.iana.org/assignments/character-sets:
Name: GB2312  (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
         two byte set:
           20-7E = one byte ASCII
           A1-FE = two byte PRC Kanji
         See GB 2312-80
         PCL Symbol Set Id: 18C
Alias: csGB2312

The document contains two-byte sequences where the second
byte is in the 20-7E range. This isn't GB2312 as defined
above. I don't know what it is.

When changing the charset to GB-18030, things worked
(the document was hopelessly invalid, though).

What really should be changed is that on
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Emuttle 
y%2Ftest%2Ftest-2-gb2312.html&ss=1

there is no information at all about the charset, and no
way to change it. There should always be a chance to change
the charset if there is an encoding-related problem.

Regards,    Martin.


At 20:15 03/04/28 +0200, Bjoern Hoehrmann wrote:
>Hi,
>
>   On [1] the validator tells:
>
>[...]
>   Sorry, I am unable to validate this document because on lines 23-26,
>   31-32, 35-36, 39-40, 43 it contained one or more bytes that I cannot
>   interpret as gb2312 (in other words, the bytes found are not valid
>   values in the specified Character Encoding). Please check both the
>   content of the file and the character encoding indication.
>[...]
>
>I don't think this is accurate, in fact, all tested browsers render the
>document as expected, so this is likely to be a bug in either the
>validator or iconv().
>
>[1]<http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Em 
>uttley%2Ftest%2Ftest-2-gb2312.html>

Received on Tuesday, 29 April 2003 11:38:02 UTC