Re: Bug 85/4494 (keeping track of validation statistics for various purposes

On Wed, 6 Feb 2008, olivier Thereaux wrote:

> * stats on the documents themselves. Doctype, mime type, charset.
> Ideally, whether charset is in HTTP, XML decl, meta. There are
> existing studies about these, but another study made on a different
> sample would bring more perspective.

That should be doable.

> * precise values for the error messages. Knowing which type of error
> is "popular" will be very useful, but so would knowing what the
> offending attributes/element/construct. In other words, knowing that
> "unknown attribute" is the #1 error will be great ? knowing that the
> top unknown attributes are frameborder or whatnot will be awesome.

I wanted to do this too, early on. I tried to customize the templating
system myself and intercept the arguments that were being passed via
error_messages.cfg, but I just did not understand the way things were
working. Specifically, in
http://dev.w3.org/cvsweb/validator/share/templates/en_US/error_messages.cfg?rev=1.32&content-type=text/x-cvsweb-markup
I wanted to preserve all the %1, %2, ... arguments (It looks like err #136
has the most arguments at 6). While it seems esoteric and totally
pointless to probably *everyone* else's needs, adding some sort of
abbreviated message of this type to SOAP might be interesting:

            <m:error>
                <m:line>596</m:line>
                <m:col>1169</m:col>
                <m:message>end tag for &quot;UL&quot; which is not
finished</m:message>
                <m:messageid>73</m:messageid>
                <m:messagearg>&quot;UL&quot;</m:messagearg>
                <m:explanation>[stuff deleted]</m:explanation>
                <m:source>[stuff deleted]</m:source>
            </m:error>

where each successive m:messagearg element captures the variable arguments
used in the error message. This is much more compact than storing the
entire error message.

Ignoring (for the moment) whether such a feature addition would be
useful to anyone else's needs, would that be hard to do? If that could be
added, I could grab that information in a future crawl.

-Brian

Brian Wilson --------------------------"Those aren't Sex muffins!   -Coach
bloo@blooberry.com ---------------------Those aren't Love muffins!
http://www.blooberry.com ---------------Those are just BLOOberry muffins!"
Creator of Index DOT Html/Css: http://www.blooberry.com/indexdot/

Received on Wednesday, 6 February 2008 17:17:46 UTC