- From: Martin Duerst <duerst@w3.org>
- Date: Tue, 29 Apr 2003 11:35:13 -0400
- To: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
It looks like the document isn't correct. From http://www.iana.org/assignments/character-sets: Name: GB2312 (preferred MIME name) MIBenum: 2025 Source: Chinese for People's Republic of China (PRC) mixed one byte, two byte set: 20-7E = one byte ASCII A1-FE = two byte PRC Kanji See GB 2312-80 PCL Symbol Set Id: 18C Alias: csGB2312 The document contains two-byte sequences where the second byte is in the 20-7E range. This isn't GB2312 as defined above. I don't know what it is. When changing the charset to GB-18030, things worked (the document was hopelessly invalid, though). What really should be changed is that on http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Emuttle y%2Ftest%2Ftest-2-gb2312.html&ss=1 there is no information at all about the charset, and no way to change it. There should always be a chance to change the charset if there is an encoding-related problem. Regards, Martin. At 20:15 03/04/28 +0200, Bjoern Hoehrmann wrote: >Hi, > > On [1] the validator tells: > >[...] > Sorry, I am unable to validate this document because on lines 23-26, > 31-32, 35-36, 39-40, 43 it contained one or more bytes that I cannot > interpret as gb2312 (in other words, the bytes found are not valid > values in the specified Character Encoding). Please check both the > content of the file and the character encoding indication. >[...] > >I don't think this is accurate, in fact, all tested browsers render the >document as expected, so this is likely to be a bug in either the >validator or iconv(). > >[1]<http://validator.w3.org/check?uri=http%3A%2F%2Fwww.ops.dti.ne.jp%2F%7Em >uttley%2Ftest%2Ftest-2-gb2312.html>
Received on Tuesday, 29 April 2003 11:38:02 UTC