- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 28 Sep 2006 18:41:18 +0000 (UTC)
- To: Karl Dubost <karl@w3.org>
- Cc: www-qa@w3.org
On Thu, 28 Sep 2006, Karl Dubost wrote: > > 1. Is there any plans for releasing a new version of the survey made by > Google? Google does not comment on future plans. > 2. Could you give the approximate ratio of Web pages for this. > - Not valid but well-formed. > - Not valid and not well-formed. Approximately 78% of pages have syntax errors more serious than missing or incorrect DOCTYPEs and bogus trailing "/" characters in start tags. The parser I used didn't check for validity (e.g. it didn't check that <p> elements weren't inside <a> elements); it basically only tested for syntactic correctness according to the HTML5 parser spec, ignoring the DOCTYPE requirements and the trailing "/" error (as in "<foo/>"). Over 13% of pages had duplicate IDs (multiple elements with the same value on the "id" attribute; I didn't check case-insensitively, nor did I check for collisions with the "name" attribute, both of which would be required for strict HTML4 compliance). The average (median) page had fifteen syntax errors according to the rules for finding syntax errors described in the HTML5 parser specification. The most common error (after DOCTYPE-related errors and bogus trailing slash errors) was the use of "</" in CDATA sections. The next most common error was incorrectly placed content in <table> elements. The third most common error was misnesting of <form> elements. The sample in question was very large (10 digits), so these are pretty representative numbers. > 3. Could you give the approximate ratio of declared HTML 4, XHTML 1.0, > XHTML 1.1 documents? In my sample, the number of pages labelled as application/xhtml+xml outweighed the number of pages marked text/html by a factor so large that it is probably not statistically meaningful. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 28 September 2006 18:41:28 UTC