[Bug 978] systematic xml preparse mode triggers wrong parse mode for xml documents with broken xml declaration

http://www.w3.org/Bugs/Public/show_bug.cgi?id=978


ot@w3.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|REOPENED                    |ASSIGNED
          Component|Parser                      |check
            Summary|errors in XMLPI make openSP |systematic xml preparse mode
                   |output errors beyond        |triggers wrong parse mode
                   |document boundaries         |for xml documents with
                   |                            |broken xml declaration




------- Comment #5 from ot@w3.org  2007-03-22 09:12 -------
http://qa-dev.w3.org/wmvs/HEAD/check?uri=http%3A%2F%2Fqa-dev.w3.org%2Fwmvs%2FHEAD%2Fdev%2Ftests%2Fbogus-xmlpi.html;debug
is useful in understanding what's happening.

* an XHTML document is sent as text/html (curse the day text/html was said to
be OK for XHTML...)
* the parse mode is set to TBD 
* preparse looks at document
  - by default HTML::Parser was set to XML mode
  - pre-parsing cannot find end of XML declaration, and thus parses the whole
doc as if...
  - the doctype cannot be found
* as a result, XML mode is NOT triggered
* openSP is launched in SGML mode
* openSP parses the XML DTD as an SGML DTD, whines
* errors are reported in the DTD (which is why it looks as though it reports
errors in the document, but at odd lines).

FIX: use pre-parser as XML mode only if the content-type has unambiguously
shown that we should do so. 
In the case of text/html, cautiously use SGML pre-parsing. Finding an XHTML
document type will later trigger xml mode in the actual parser and validator.

[[
my $p = HTML::Parser->new(api_version => 3);

- $p->xml_mode(TRUE);

+ # if content-type has shown we should pre-parse with XML mode, use that
+ # otherwise (mostly text/html cases) use default mode
+ $p->xml_mode(TRUE) if ($File->{Mode} eq 'XML');
]]

I have to test this patch against a number of other test cases, but I'm hopeful
it should be the solution to this problem, as well as Bug #14.

Received on Thursday, 22 March 2007 14:05:25 UTC