Validation error frequencies

I ran an analysis on recent error messages from Validator.nu.
http://hsivonen.iki.fi/test/moz/analysis.txt

The first number is the number of occurrences. The second number is  
the total of distinct URIs that were analyzed.

Methodology:
The analyzed pages were those that users of Validator.nu chose to  
validate. Only errors for public Web pages were logged. Content POSTed  
to Validator.nu is not covered. Pages whose URI contained "/test/" or  
"/tests/" were excluded. URIs and IDs were replaced with "(redacted)"  
before tallying the results. A given error was counted at most once  
per URI, so duplicate errors on one page count only once. Other than  
the "(redacted)" bits, messages are not intelligently consolidated.  
Only (X)HTML5 errors were logged and analyzed. This doesn't not  
include data from the XHTML 1.0 / HTML 4.01 features of Validator.nu.  
The messages are not exactly in the decorated from of the UI: even  
messages pertaining to text/html have the XHTML cruft in them.

Note:
Currently people are mainly using the HTML5 features of Validator.nu  
to validate pre-HTML5 content as HTML5. Validator.nu doesn't support  
<font> but supports style='' on every element.

Observation:
After the December content model change, element containment errors  
are no longer an issue for updating legacy templates. Now the most  
common errors pertain to attributes obsoleted by HTML5 and to spaces  
in IRIs (and to legacy doctypes, of course).

I hope the WG finds this data useful for spec development.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Thursday, 31 January 2008 12:26:57 UTC