- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 29 Apr 2003 11:35:13 -0400
- To: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
It looks like the document isn't correct.
From http://www.iana.org/assignments/character-sets:
Name: GB2312 (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
two byte set:
20-7E = one byte ASCII
A1-FE = two byte PRC Kanji
See GB 2312-80
PCL Symbol Set Id: 18C
Alias: csGB2312
The document contains two-byte sequences where the second
byte is in the 20-7E range. This isn't GB2312 as defined
above. I don't know what it is.
When changing the charset to GB-18030, things worked
(the document was hopelessly invalid, though).
What really should be changed is that on
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Emuttle
y%2Ftest%2Ftest-2-gb2312.html&ss=1
there is no information at all about the charset, and no
way to change it. There should always be a chance to change
the charset if there is an encoding-related problem.
Regards, Martin.
At 20:15 03/04/28 +0200, Bjoern Hoehrmann wrote:
>Hi,
>
> On [1] the validator tells:
>
>[...]
> Sorry, I am unable to validate this document because on lines 23-26,
> 31-32, 35-36, 39-40, 43 it contained one or more bytes that I cannot
> interpret as gb2312 (in other words, the bytes found are not valid
> values in the specified Character Encoding). Please check both the
> content of the file and the character encoding indication.
>[...]
>
>I don't think this is accurate, in fact, all tested browsers render the
>document as expected, so this is likely to be a bug in either the
>validator or iconv().
>
>[1]<http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Em
>uttley%2Ftest%2Ftest-2-gb2312.html>
Received on Tuesday, 29 April 2003 11:38:02 UTC