- From: Ben Meadowcroft <cee.plus@virgin.net>
- Date: Tue, 10 Jun 2003 09:53:23 +0100
- To: <public-evangelist@w3.org>
In the document available at http://www.w3.org/QA/2002/04/Web-Quality it is stated that "Most of the Web sites on the Web are not valid. We may assume that this is the case for 99% of the Web pages, but there are no statistics to support this. It would be interesting to run a survey to prove that this case is indeed true." It is true that a large number of websites are invalid. There was a thesis recently written entitled "How to cope with incorrect HTML", which dealt with the nature of errors in HTML documents and strategies for overcoming them. As part of this thesis an investigation into the number of invalid documents and the type of errors was performed. The results are available from the thesis ( urn:isbn:82-8088-088-7 ), available from http://www.ub.uib.no/elpub/2001/h/413001/ I have summarised the results in an entry on my weblog available at http://www.benmeadowcroft.com/me/archive/2003/january.shtml#link25th The sample size was 2,398,226 documents of which 14,563 were valid HTML documents. Taking factors such as unknown DTD's etc into account the number of documents tested which were valid was 0.71% I hope this information is of some interest. -- Ben Meadowcroft http://www.benmeadowcroft.com/
Received on Tuesday, 10 June 2003 04:53:29 UTC