- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 28 Feb 2008 01:58:53 +0000
- To: HTML WG <public-html@w3.org>
I've got some data about doctypes at http://philip.html5.org/data/doctypes.html (125K pages from dmoz.org) and http://philip.html5.org/data/doctypes-alexa.html (about 400 from Alexa's list). I'm not entirely sure what this could be useful for, but I'll point out a couple of things here. Summary of some of the dmoz.org data: 48% of the pages have no doctype at all. 24% have a doctype that is quirks mode in HTML5. ("HTML5" could equivalently be "Firefox", since they have almost identical mode selection.) 23% are almost-standards (limited-quirks) mode. 5% are standards mode. Also, 4% are Strict. 18% are XHTML 1.0; 24% are HTML4. Only 0.2% use single quotes. The data includes a comparison of the standards/quirks mode decisions that IE7 and HTML5 would make. There is mostly good agreement; the main difference is the ~1% that are treated as standards mode in IE7 and as quirks mode in HTML5, and half of those are from <!doctype html public "-//w3c//dtd html 4.0 transitional//en" "http://www.w3.org/tr/rec-html40/loose.dtd">. It would be interesting to see if those pages would work better if treated as HTML5 standards mode instead (i.e. being more compatible with IE, less with Firefox). http://www.thermaglaze.com/ has <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [ <!ATTLIST a target CDATA #IMPLIED> ]> -- some people really want to use <a target="_blank">, and will do anything to make it work while still having the validator claim their page is okay. 0.1% replaced the "...//EN" with their own language code, e.g. http://www.edelweiss-reizen.nl has <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//NL" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> Lots of people end up with incorrect doctypes due to typos (e.g. about 0.05% wrote "-//WC3/..."), escaping quotes with backslashes, globally search-and-replacing 'html' with 'php', and various other issues. It's nice that "<!doctype html>" is easy to write, since people clearly aren't great at copying boilerplate code. -- Philip Taylor pjt47@cam.ac.uk
Received on Thursday, 28 February 2008 01:59:01 UTC