- From: Brian Wilson <bloo@blooberry.com>
- Date: Mon, 16 Mar 2009 08:33:11 -0700 (PDT)
- To: www-validator@w3.org
Hi all, I've just had MAMA do a validation pass on about 4.8 million URLs as part of an updated study of URLs that it previously analyzed. Olivier and the rest of the kind W3C crew were able to help in this process and I just wanted to give a big thanks for that. There is a lot more analysis and filtering of the results to do before I can speak to what was discovered, but there was one specific request that I can already say something about. Keep in mind that these results are pretty raw, and I can try and do some further correlation if needed. The main validation errors that MAMA encountered in its last crawl were error 76 (element not defined) and 108 (no attribute X). The way I set up MAMA's storage last time, it didn't save the arguments for individual error messages. Since 76 and 108 were the most popular, it was interesting (especially for Olivier and company) to try and find out this time what elements and attributes were generating the most errors. Here's a list of the top 50 "element not defined" error arguments: Rank Element Quantity ---------------------------------------- 1 embed 596216 2 frame 261478 3 frameset 261414 4 marquee 119502 5 script 101868 6 font 98239 7 meta 97210 8 nobr 85973 9 a 82982 10 img 69357 11 center 67397 12 iframe 59825 13 br 59763 14 td 58999 15 tr 57505 16 table 56409 17 o:p 56238 18 div 43928 19 p 40632 20 csscriptdict 28110 21 span 28060 22 csactiondict 27004 23 spacer 26298 24 noscript 26142 25 noindex 24848 26 b 23482 27 bgsound 22625 28 layer 22304 29 u 22061 30 blink 20352 31 link 20092 32 input 20049 33 title 19783 34 csobj 19578 35 ilayer 18940 36 tbody 17637 37 scr 17237 38 variable 16058 39 strong 15946 40 form 14862 41 body 14527 42 head 13999 43 noembed 13139 44 style 12139 45 st1:place 12094 46 param 12008 47 csactions 11831 48 csaction 11787 49 object 11774 50 html 10918 ---------------------------------------- And the list of the top 50 "No attribute X" error arguments: Rank Element Quantity ---------------------------------------- 1 height 1624934 2 src 1018458 3 width 926904 4 topmargin 884663 5 leftmargin 831174 6 marginheight 792137 7 background 791243 8 marginwidth 786816 9 name 755187 10 border 745194 11 type 685526 12 pluginspage 498477 13 quality 494275 14 bordercolor 436465 15 align 384137 16 frameborder 321235 17 bgcolor 318466 18 target 289435 19 scrolling 253640 20 framespacing 239515 21 language 224452 22 rows 208679 23 color 197183 24 id 193811 25 cols 193689 26 valign 159971 27 rightmargin 153635 28 allowscriptaccess 151814 29 style 136092 30 wmode 132042 31 alt 127582 32 href 125676 33 bottommargin 122285 34 content 116613 35 onmouseover 111657 36 onmouseout 103736 37 onclick 100550 38 hspace 99552 39 size 93957 40 class 93321 41 loop 92015 42 vspace 89939 43 onload 79416 44 allowfullscreen 74648 45 cellpadding 73775 46 bordercolorlight 72975 47 cellspacing 71222 48 scrollamount 69989 49 bordercolordark 69810 50 face 68687 ---------------------------------------- (It might be interesting for the error message to also list the element it is hitting the attribute error with - that would help explain why height is occurring almost twice as much as width here). Hope this is interesting and/or helpful, -Brian
Received on Monday, 16 March 2009 15:33:51 UTC