Doctype usage data

I've got some data about doctypes at (125K pages from 
and (about 400 from 
Alexa's list). I'm not entirely sure what this could be useful for, but 
I'll point out a couple of things here.

Summary of some of the data:
48% of the pages have no doctype at all.
24% have a doctype that is quirks mode in HTML5. ("HTML5" could 
equivalently be "Firefox", since they have almost identical mode selection.)
23% are almost-standards (limited-quirks) mode.
5% are standards mode.
Also, 4% are Strict. 18% are XHTML 1.0; 24% are HTML4. Only 0.2% use 
single quotes.

The data includes a comparison of the standards/quirks mode decisions 
that IE7 and HTML5 would make. There is mostly good agreement; the main 
difference is the ~1% that are treated as standards mode in IE7 and as 
quirks mode in HTML5, and half of those are from <!doctype html public 
"-//w3c//dtd html 4.0 transitional//en" 
"">. It would be interesting to 
see if those pages would work better if treated as HTML5 standards mode 
instead (i.e. being more compatible with IE, less with Firefox). has <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 
1.0 Strict//EN" "" [ 
<!ATTLIST a target CDATA #IMPLIED> ]> -- some people really want to use 
<a target="_blank">, and will do anything to make it work while still 
having the validator claim their page is okay.

0.1% replaced the "...//EN" with their own language code, e.g. has <!DOCTYPE html PUBLIC "-//W3C//DTD 
XHTML 1.0 Strict//NL" 

Lots of people end up with incorrect doctypes due to typos (e.g. about 
0.05% wrote "-//WC3/..."), escaping quotes with backslashes, globally 
search-and-replacing 'html' with 'php', and various other issues. It's 
nice that "<!doctype html>" is easy to write, since people clearly 
aren't great at copying boilerplate code.

Philip Taylor

Received on Thursday, 28 February 2008 01:59:01 UTC