[Bug 5336] Non-XML characters from input copied to XML output making it ill-formed

http://www.w3.org/Bugs/Public/show_bug.cgi?id=5336

           Summary: Non-XML characters from input copied to XML output
                    making it ill-formed
           Product: Validator
           Version: 0.8.2
          Platform: All
               URL: http://validator.w3.org/check?uri=http%3A%2F%2Fphilip.ht
                    ml5.org%2Fmisc%2Fchars.html&charset=iso-8859-
                    1&output=soap12
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: Templates
        AssignedTo: dave.null@w3.org
        ReportedBy: hsivonen@iki.fi
         QAContact: www-validator-cvs@w3.org


Steps to reproduce:
1) Load
http://validator.w3.org/check?uri=http%3A%2F%2Fphilip.html5.org%2Fmisc%2Fchars.html&charset=iso-8859-1&output=soap12
2) Examine the result or try parsing it as XML

Actual results:
At line 30, column 147, there's U+0000, which is forbidden in XML.

Expected results:
Expected characters that are prohibited by XML to be replaced with the
REPLACEMENT CHARACTER when a normal character would be copied to output.

Received on Wednesday, 2 January 2008 14:00:44 UTC