[Bug 4867] non UTF-8 pages cause XML error although it doesn't have

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4867

           Summary: non UTF-8 pages cause XML error although it doesn't have
           Product: Validator
           Version: 0.8.0b2
          Platform: PC
               URL: http://www.mitsue.co.jp/
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Parser
        AssignedTo: dave.null@w3.org
        ReportedBy: yakura-masataka@mitsue.co.jp
         QAContact: www-validator-cvs@w3.org


There seems to be a bug in the new XML parser. It doesn't recognize some
Japanese encodings other than UTF, such as Shift_JIS, EUC-JP.

Try validating http://www.mitsue.co.jp/ , you'll see some XML errors. But try
saving the page in an XML format (mitsue.xml) and opening it in Firefox and
Internet Explorer, I got no such errors. Rewrite the source substituting
"shift_jis" for "UTF-8" and it will validate. Thus, the validator seems to have
some encoding detection and handling issues.

There are so many webpages with Shift_JIS or EUC-JP or whatever non-UTF. I'm
afraid that launching the new validator without fixing that issue would cause
serious confusion in Japanese market.

Received on Thursday, 19 July 2007 01:00:17 UTC