Re: Parsing: Trailing garbage in doctype FPI (was: Re: Doctype usage data)

On Friday 2008-05-23 03:19 +0000, Ian Hickson wrote:
> On Mon, 3 Mar 2008, Simon Pieters wrote:
> > > 
> > > I've got some data about doctypes at 
> > > http://philip.html5.org/data/doctypes.html (125K pages from dmoz.org) 
> > > and http://philip.html5.org/data/doctypes-alexa.html (about 400 from 
> > > Alexa's list). I'm not entirely sure what this could be useful for, 
> > > but I'll point out a couple of things here.
> > 
> > [...] This means that Opera would break about 0.05% of pages of this 
> > sample if we implemented HTML5 doctype switching, assuming that the 
> > remaining pages I didn't look at were the same.

It looks (from the limited context in the email) that you're talking
about making quirks-mode detection handle pages where the author has
manually changed the "EN" in the FPI to match the language of the
page content, or similar.

Are the data you present showing that pages with these broken
DOCTYPEs break if they're not in quirks mode, or simply that pages
have these broken doctypes?  It's a pretty significant difference.

> > I think this is pretty convincing that HTML5 needs to ignore what is in 
> > place of the "EN" at the end of the FPIs, that is instead of matching 
> > that the FPI is e.g. -//W3C//DTD HTML 3.2//EN, check that it starts with 
> > -//W3C//DTD HTML 3.2//.
> > 
> > For the FPIs that end in //EN//2.0 and the like, I'd suggest to just 
> > drop them from the list since there are equivalent FPIs that end in //EN 
> > and the //2.0 would be treated as trailing garbage.
> 
> Done.
> 
> This is quite a major change. I would like feedback from vendors about 
> this change.
> 
>    http://www.whatwg.org/specs/web-apps/current-work/#the-initial

I've heard occasional requests for this change in Mozilla, but they
didn't seem to make it worthwhile changing an algorithm whose main
purpose should, I think, be its stability.  At this point, I'd think
pages are as likely to break if we change them from standards mode
to quirks mode as vice-versa.  Do you have evidence that this
wouldn't be the case for the pages that this change would affect?

-David

-- 
L. David Baron                                 http://dbaron.org/
Mozilla Corporation                       http://www.mozilla.com/

Received on Friday, 23 May 2008 03:28:59 UTC