Validation error frequencies

I ran an analysis on recent error messages from

The first number is the number of occurrences. The second number is  
the total of distinct URIs that were analyzed.

The analyzed pages were those that users of chose to  
validate. Only errors for public Web pages were logged. Content POSTed  
to is not covered. Pages whose URI contained "/test/" or  
"/tests/" were excluded. URIs and IDs were replaced with "(redacted)"  
before tallying the results. A given error was counted at most once  
per URI, so duplicate errors on one page count only once. Other than  
the "(redacted)" bits, messages are not intelligently consolidated.  
Only (X)HTML5 errors were logged and analyzed. This doesn't not  
include data from the XHTML 1.0 / HTML 4.01 features of  
The messages are not exactly in the decorated from of the UI: even  
messages pertaining to text/html have the XHTML cruft in them.

Currently people are mainly using the HTML5 features of  
to validate pre-HTML5 content as HTML5. doesn't support  
<font> but supports style='' on every element.

After the December content model change, element containment errors  
are no longer an issue for updating legacy templates. Now the most  
common errors pertain to attributes obsoleted by HTML5 and to spaces  
in IRIs (and to legacy doctypes, of course).

I hope the WG finds this data useful for spec development.

Henri Sivonen

Received on Thursday, 31 January 2008 12:26:57 UTC