W3C home > Mailing lists > Public > public-html@w3.org > February 2010

Doctypes with "[" after public identifier

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Thu, 18 Feb 2010 16:47:13 +0000
Message-ID: <4B7D6F11.9090408@cam.ac.uk>
To: Boris Zbarsky <bzbarsky@MIT.EDU>
CC: "public-html@w3.org" <public-html@w3.org>
Boris Zbarsky wrote:
> On 2/17/10 4:29 AM, Philip Taylor wrote:
>> Yes, but in pre-HTML5 browsers (IE, Firefox 3.6 without html5.enable,
>> etc) doctypes will still only be parsed up to the *first* ">", so you
>> will get the characters "]>" inserted as text into the body of the
>> document
> That's the case with the HTML5 parser as well, no?

Yes - that aspect of the parsing hasn't changed.

(I think the only browser that attempts to parse this differently is 
Opera, which seems to ignore any ">" unless it has previously seen an 
equal number of "[" and "]" characters (in any order).)

> I agree with Julian's concern: going from treating a doctype as 
> standards to treating a doctype as quirks seems like a bad idea to me.

As a first approximation, changes are bad. As a second approximation, 
changes are bad if they break existing content. It's not clear what 
behaviour here will break least.

The specific case is "[" after the public identifier, and before the 
system identifier. This can't happen in well-formed XML (the system 
identifier is required, and the internal subset comes after it), though 
I've heard that SGML allows it. It's handled in HTML5 
exactly like any other bogus character (i.e. forcing quirks mode), but 
Firefox appears to have a special case for "[" in this location 
(preventing quirks).

Looking through half a million pages for the pattern


results in two sites:

     <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 

     <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" []>

Looking for interesting pages on those sites:

http://www.freemanforman.co.uk/content/001_Area_Search/ - in Firefox 
3.6, the map renders incorrectly (it's positioned too far up/right and 
clipped) if html5.enable is *on* (which triggers quirks mode).

http://symptomresearch.nih.gov/grantopportunities.htm - the menu items 
are too widely spaced and the skip link underlines are visible when 
html5.enable is *off*.

So something breaks in Firefox either way. Possible options:

  * Ignore this, under the belief that minor breakage of 0.001% of sites 
(which have bogus doctypes and are already broken in some browsers) is 
not worth spending more time on.

  * Collect more data about whether special-casing "[" would cause more 
breakage or less breakage, and adjust the spec accordingly. (Probably 
need to look at tens or hundreds of millions of pages to get a good 
idea, since it's so rare.)

  * Make additional changes to the doctype logic so both of these pages 
can render correctly.

Filed as http://www.w3.org/Bugs/Public/show_bug.cgi?id=9071

> -Boris

Philip Taylor
Received on Thursday, 18 February 2010 16:47:41 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:58 UTC