- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Fri, 19 Feb 2010 05:29:15 +0100
- To: Philip Taylor <pjt47@cam.ac.uk>
- Cc: Boris Zbarsky <bzbarsky@MIT.EDU>, "public-html@w3.org" <public-html@w3.org>
Philip Taylor, Thu, 18 Feb 2010 16:47:13 +0000: > Boris Zbarsky wrote: >> On 2/17/10 4:29 AM, Philip Taylor wrote: >>> Yes, but in pre-HTML5 browsers (IE, Firefox 3.6 without html5.enable, >>> etc) doctypes will still only be parsed up to the *first* ">", so you >>> will get the characters "]>" inserted as text into the body of the >>> document >> >> That's the case with the HTML5 parser as well, no? > > Yes - that aspect of the parsing hasn't changed. > > (I think the only browser that attempts to parse this differently is > Opera, which seems to ignore any ">" unless it has previously seen an > equal number of "[" and "]" characters (in any order).) Konqueror also has zero problems with the superfluous "]>". A positive thing with the HTML5 parsing model is that it becomes simpler to hide the "]>", via "comment tricks". For example, I tried to replicate what I managed to do inside the HTML4 doctype inside a XHTML1 doctype. And it was quite easy - except in Firefox (due to stricter comment rules in XHTML, which allows fewer tricks). But as soon as I turned on HTML5.enable, then it worked nicely in Firefox as well - the "]>" became hidden. (Of course, would be better if it disappeared completely ...) >> I agree with Julian's concern: going from treating a doctype as >> standards to treating a doctype as quirks seems like a bad idea to >> me. > > As a first approximation, changes are bad. As a second approximation, > changes are bad if they break existing content. It's not clear what > behaviour here will break least. You pointed to an example which you claimed looked better in QuirksMode. OTOH, you said that there were so few such examples, that they hardly count. It seems better to operate with a principle. And: Those who are using [] inside the DOCTYPE probably do not tend to do so because they want quirksmode. > The specific case is "[" after the public identifier, and before the > system identifier. Perhaps only technical terms from your part, but my example page had a doctype _without_ any system identifier: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" [<!ATTLIST P myattr CDATA #implied >]> This is only allowed for HTML4 -not XHTML. > This can't happen in well-formed XML (the system > identifier is required, and the internal subset comes after it), > though I've heard that SGML allows it. I don't think the system identifier can come after the internal subset. Can you show a valid such doctype in the validator? However, _this_ is a valid HTML4 doctype: <!DOCTYPE HTML PUBLIC --comment-- "-//W3C//DTD HTML 4.01//EN" --comment-- [ <!ATTLIST P myattr CDATA #implied --comment-- > ] --comment-- > In Safari 4 it triggers quirks mode - probably due to HTML5 preparation. In Firefox with HTML5.enable too. And in Opera 10.5beta as well. But not in IE. Not in Firefox without HTML5.enable. Not Opera in 10.10. None of the legacy/current browsers - except Safari. We should revert the HTML5 behavior as soon as possible! > It's handled in HTML5 > (http://whatwg.org/html#between-doctype-public-and-system-identifiers-state) > exactly like any other bogus character (i.e. forcing quirks mode), > but Firefox appears to have a special case for "[" in this location > (preventing quirks). When you say "Firefox appears to have", then you mean Firefox' HTML5 implementation, I suppose? > Looking through half a million pages for the pattern > > (?i)<!doctype\s+html\s+public\s+"[^"]+"\s*\[ > > results in two sites: > > http://www.freemanforman.co.uk/ > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > [url=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > > http://symptomresearch.nih.gov/ > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" []> Mister Taylor: That is a transitional doctype. The reason it triggers quirks has to do with that, and is _not_ related to the "[]". > Looking for interesting pages on those sites: > > http://www.freemanforman.co.uk/content/001_Area_Search/ - in Firefox > 3.6, the map renders incorrectly (it's positioned too far up/right > and clipped) if html5.enable is *on* (which triggers quirks mode). That doctype doesn't trigger quirks in Internet Explorer - at least. > http://symptomresearch.nih.gov/grantopportunities.htm - the menu > items are too widely spaced and the skip link underlines are visible > when html5.enable is *off*. > > So something breaks in Firefox either way. Possible options: Gee. Are you saying that we can stop making transitional doctypes trigger quirks? ;-) (See above.) > * Ignore this, under the belief that minor breakage of 0.001% of > sites (which have bogus doctypes and are already broken in some > browsers) is not worth spending more time on. I see no advantage in that. Except convenience - for Safari and Opera, which has attempted to implement the HTML5 spec more fully here. > * Collect more data about whether special-casing "[" would cause > more breakage or less breakage, and adjust the spec accordingly. > (Probably need to look at tens or hundreds of millions of pages to > get a good idea, since it's so rare.) This is the wrong attitude: It currently/historically doesn't haven any effect on the parsing modus. And so we should not investigate whether we can get away with making it trigger quirks mode. > * Make additional changes to the doctype logic so both of these > pages can render correctly. I clearly favour this option. There are likely too few pages to find out whether it breaks more or less to do the one or the other thing. However: We do know that current browsers _do not_ trigger quirksmode because of the []. That alone should be enough Unless we choose that option, then HTML5 will result in more doctypes triggering quirks. And that would be a very funny result of the HTML5 effort ... > Filed as http://www.w3.org/Bugs/Public/show_bug.cgi?id=9071 Wonder why we needed two bug reports for this ... -- leif halvard silli
Received on Friday, 19 February 2010 04:29:55 UTC