[Bug 24] HTML::Parser in XML mode doesn't work with lowercase doctypes from bugzilla@wiggum.w3.org on 2006-08-30 (www-validator-cvs@w3.org from August 2006)

From: <bugzilla@wiggum.w3.org>
Date: Wed, 30 Aug 2006 07:26:23 +0000
To: www-validator-cvs@w3.org
CC:
Message-Id: <E1GIKSt-0007xY-Hi@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=24


ot@w3.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  BugsThisDependsOn|                            |1500




------- Comment #8 from ot@w3.org  2006-08-30 07:26 -------
Discussions on Bug #1500 and others seem to point to one solution:
- if the content-type is text/html, then set HTML::Parser in html without
xml_mode()

- if the content-type is anything else (as far as I can tell, all other
document types supported by the validator are xml-based), then set HTML::Parser

The current code for check goes:
1) set temporary parse mode to SGML, XML or TBD based on content-type
( check v 1.432.2.11; lines 188-193 )
2) run $File = &preparse_doctype($File); in systematic XML mode.
( check v 1.432.2.11; line 526 )
3) if parse mode is still TBD, based on doctype found, set parse mode to SGML
or XML
( check v 1.432.2.11; lines 530-560 )

If I understand the proposal of Bug #1500, the new behavior would be
1) set parse mode to SGML, XML based on content-type. text/html systematically
means SGML mode
2) run $File = &preparse_doctype($File); using detected parse mode as mode for
HTML::Parser and subsequent doctype detection.

If the above is correct, then it seems fixing this bug (Bug #24) is almost
immediate once Bug #1500 is resolved. 

Am I missing something?

Received on Wednesday, 30 August 2006 07:26:37 UTC