MAMA - a new tool and its results (Major study of the W3C validator)

I don't want to steal thunder from Philip's announcement yesterday
of his "By The Numbers . Fall 2008" study, but it is also time for me
to announce another validation study. I've written a tool called MAMA
("Metadata Analysis and Mining Application"), which analyzes a Web page
and tracks as many of its structures as possible (including markup, CSS
and scripting). As part of this process, all pages analyzed are also run
through the W3C markup validator. So far, ~3.5 million URLs have been
analyzed. I've been working on this project for quite some time now and
it is finally time to share some of its findings...starting with
validation.

Condensed validation highlights:
   http://dev.opera.com/articles/view/mama-markup-validation-report/
Full validation study (long):
   http://dev.opera.com/articles/view/mama-w3c-validator-research-2/

Here is a peek at the index of the full version:
   1.  About markup validation: an introduction
   2.  Previous validation studies
   3.  Sources and tools: The URL set and the validator
   4.  What use is markup validation to an author?
   5.  How many pages validated?
   6.  Interesting views of validation rates, part 1: W3C-Member companies
   7.  Interesting views of validation rates, part 2: Alexa Global Top 500
   8.  Validation badge/icons: An interesting diversion?
   9.  Doctypes
   10. Character sets
   11. Validator failures
   12. Validator warnings
   13. Validator errors
   14. Summing up ...
   15. Appendix: Validation methodology

MAMA's main analysis of the URLs occurred in November 2007 but the
validation portion occurred in January 2008. After completing all
that, doing a write-up of the validation findings was the first topic
I tackled. I figured that the section would be fairly brief. Boy,
was I wrong; it turned out to be the *longest* of any of MAMA's
topics. There is a lot to say about the process of validation!
The validation study was written specifically with the W3C validator
mailing list in mind, so it gets technical and long-winded at times.
The extra levels of detail should create added fun and mystery for
one and all.

The validation study is also the first of MAMA's main results being
released. The main index for the MAMA project results is here:
   http://dev.opera.com/articles/view/mama/
and provides a lot of additional information, including motivation, a
quick summary of some of the major results, and some of MAMA's
methodologies. The index will be where you can access the new articles
as they come out (about 2 dozen over the coming weeks on different Web
page topics). These won't be directly about validation but may still be
of interest.

For the future, the plan is for MAMA to continue this mass-validation
process in regular intervals so as to provide additional data about
how Web page validation trends changes over time.

Bringing things back to Philip's study, I think comparisons and
differences between our two studies can produce interesting points for
further discussion. Many thanks to Philip, Olivier and Karl for
discussions and input along the way on MAMA's validation study. I hope
you all find it worth the read.

Thanks,
-Brian

Brian Wilson --------------------------"Those aren't Sex muffins!   -Coach
bloo@blooberry.com ---------------------Those aren't Love muffins!
http://www.blooberry.com ---------------Those are just BLOOberry muffins!"
Creator of Index DOT Html/Css: http://www.blooberry.com/indexdot/

Received on Thursday, 16 October 2008 08:14:20 UTC