- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 28 Feb 2008 01:58:53 +0000
- To: HTML WG <public-html@w3.org>
I've got some data about doctypes at
http://philip.html5.org/data/doctypes.html (125K pages from dmoz.org)
and http://philip.html5.org/data/doctypes-alexa.html (about 400 from
Alexa's list). I'm not entirely sure what this could be useful for, but
I'll point out a couple of things here.
Summary of some of the dmoz.org data:
48% of the pages have no doctype at all.
24% have a doctype that is quirks mode in HTML5. ("HTML5" could
equivalently be "Firefox", since they have almost identical mode selection.)
23% are almost-standards (limited-quirks) mode.
5% are standards mode.
Also, 4% are Strict. 18% are XHTML 1.0; 24% are HTML4. Only 0.2% use
single quotes.
The data includes a comparison of the standards/quirks mode decisions
that IE7 and HTML5 would make. There is mostly good agreement; the main
difference is the ~1% that are treated as standards mode in IE7 and as
quirks mode in HTML5, and half of those are from <!doctype html public
"-//w3c//dtd html 4.0 transitional//en"
"http://www.w3.org/tr/rec-html40/loose.dtd">. It would be interesting to
see if those pages would work better if treated as HTML5 standards mode
instead (i.e. being more compatible with IE, less with Firefox).
http://www.thermaglaze.com/ has <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML
1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [
<!ATTLIST a target CDATA #IMPLIED> ]> -- some people really want to use
<a target="_blank">, and will do anything to make it work while still
having the validator claim their page is okay.
0.1% replaced the "...//EN" with their own language code, e.g.
http://www.edelweiss-reizen.nl has <!DOCTYPE html PUBLIC "-//W3C//DTD
XHTML 1.0 Strict//NL"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
Lots of people end up with incorrect doctypes due to typos (e.g. about
0.05% wrote "-//WC3/..."), escaping quotes with backslashes, globally
search-and-replacing 'html' with 'php', and various other issues. It's
nice that "<!doctype html>" is easy to write, since people clearly
aren't great at copying boilerplate code.
--
Philip Taylor
pjt47@cam.ac.uk
Received on Thursday, 28 February 2008 01:59:01 UTC