- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 18 Feb 2010 16:47:13 +0000
- To: Boris Zbarsky <bzbarsky@MIT.EDU>
- CC: "public-html@w3.org" <public-html@w3.org>
Boris Zbarsky wrote:
> On 2/17/10 4:29 AM, Philip Taylor wrote:
>> Yes, but in pre-HTML5 browsers (IE, Firefox 3.6 without html5.enable,
>> etc) doctypes will still only be parsed up to the *first* ">", so you
>> will get the characters "]>" inserted as text into the body of the
>> document
>
> That's the case with the HTML5 parser as well, no?
Yes - that aspect of the parsing hasn't changed.
(I think the only browser that attempts to parse this differently is
Opera, which seems to ignore any ">" unless it has previously seen an
equal number of "[" and "]" characters (in any order).)
> I agree with Julian's concern: going from treating a doctype as
> standards to treating a doctype as quirks seems like a bad idea to me.
As a first approximation, changes are bad. As a second approximation,
changes are bad if they break existing content. It's not clear what
behaviour here will break least.
The specific case is "[" after the public identifier, and before the
system identifier. This can't happen in well-formed XML (the system
identifier is required, and the internal subset comes after it), though
I've heard that SGML allows it. It's handled in HTML5
(http://whatwg.org/html#between-doctype-public-and-system-identifiers-state)
exactly like any other bogus character (i.e. forcing quirks mode), but
Firefox appears to have a special case for "[" in this location
(preventing quirks).
Looking through half a million pages for the pattern
(?i)<!doctype\s+html\s+public\s+"[^"]+"\s*\[
results in two sites:
http://www.freemanforman.co.uk/
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
[url=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
http://symptomresearch.nih.gov/
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" []>
Looking for interesting pages on those sites:
http://www.freemanforman.co.uk/content/001_Area_Search/ - in Firefox
3.6, the map renders incorrectly (it's positioned too far up/right and
clipped) if html5.enable is *on* (which triggers quirks mode).
http://symptomresearch.nih.gov/grantopportunities.htm - the menu items
are too widely spaced and the skip link underlines are visible when
html5.enable is *off*.
So something breaks in Firefox either way. Possible options:
* Ignore this, under the belief that minor breakage of 0.001% of sites
(which have bogus doctypes and are already broken in some browsers) is
not worth spending more time on.
* Collect more data about whether special-casing "[" would cause more
breakage or less breakage, and adjust the spec accordingly. (Probably
need to look at tens or hundreds of millions of pages to get a good
idea, since it's so rare.)
* Make additional changes to the doctype logic so both of these pages
can render correctly.
Filed as http://www.w3.org/Bugs/Public/show_bug.cgi?id=9071
> -Boris
--
Philip Taylor
pjt47@cam.ac.uk
Received on Thursday, 18 February 2010 16:47:41 UTC