W3C home > Mailing lists > Public > public-html@w3.org > May 2008

Re: Parsing: Trailing garbage in doctype FPI (was: Re: Doctype usage data)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 23 May 2008 03:19:24 +0000 (UTC)
To: Simon Pieters <simonp@opera.com>
Cc: Philip Taylor <pjt47@cam.ac.uk>, dbaron@dbaron.org, hyatt@apple.com, HTML WG <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0805230308120.12911@hixie.dreamhostps.com>

On Mon, 3 Mar 2008, Simon Pieters wrote:
> > 
> > I've got some data about doctypes at 
> > http://philip.html5.org/data/doctypes.html (125K pages from dmoz.org) 
> > and http://philip.html5.org/data/doctypes-alexa.html (about 400 from 
> > Alexa's list). I'm not entirely sure what this could be useful for, 
> > but I'll point out a couple of things here.
> 
> [...] This means that Opera would break about 0.05% of pages of this 
> sample if we implemented HTML5 doctype switching, assuming that the 
> remaining pages I didn't look at were the same.
> 
> I think this is pretty convincing that HTML5 needs to ignore what is in 
> place of the "EN" at the end of the FPIs, that is instead of matching 
> that the FPI is e.g. -//W3C//DTD HTML 3.2//EN, check that it starts with 
> -//W3C//DTD HTML 3.2//.
> 
> For the FPIs that end in //EN//2.0 and the like, I'd suggest to just 
> drop them from the list since there are equivalent FPIs that end in //EN 
> and the //2.0 would be treated as trailing garbage.

Done.

This is quite a major change. I would like feedback from vendors about 
this change.

   http://www.whatwg.org/specs/web-apps/current-work/#the-initial

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 23 May 2008 03:20:02 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:38:55 UTC