W3C home > Mailing lists > Public > www-validator-cvs@w3.org > July 2007

[Bug 4867] non UTF-8 pages cause XML error although it doesn't have

From: <bugzilla@wiggum.w3.org>
Date: Thu, 19 Jul 2007 01:00:13 +0000
CC:
To: www-validator-cvs@w3.org
Message-Id: <E1IBKNJ-0001bz-8u@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4867

           Summary: non UTF-8 pages cause XML error although it doesn't have
           Product: Validator
           Version: 0.8.0b2
          Platform: PC
               URL: http://www.mitsue.co.jp/
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Parser
        AssignedTo: dave.null@w3.org
        ReportedBy: yakura-masataka@mitsue.co.jp
         QAContact: www-validator-cvs@w3.org


There seems to be a bug in the new XML parser. It doesn't recognize some
Japanese encodings other than UTF, such as Shift_JIS, EUC-JP.

Try validating http://www.mitsue.co.jp/ , you'll see some XML errors. But try
saving the page in an XML format (mitsue.xml) and opening it in Firefox and
Internet Explorer, I got no such errors. Rewrite the source substituting
"shift_jis" for "UTF-8" and it will validate. Thus, the validator seems to have
some encoding detection and handling issues.

There are so many webpages with Shift_JIS or EUC-JP or whatever non-UTF. I'm
afraid that launching the new validator without fixing that issue would cause
serious confusion in Japanese market.
Received on Thursday, 19 July 2007 01:00:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:54:58 GMT