Re: Bug 85/4494 (keeping track of validation statistics for various purposes

Hi Brian,

On Jan 30, 2008, at 08:09 , Brian Wilson wrote:
> I'd like to be able to help with these bugs, and I think that the  
> research I've been doing lately at Opera would probably be able to  
> close the issue.

Fantastic. From the discussions I've seen on IRC, I think your  
research would be very useful indeed.

With regards to bug 85, ideally the sample would have been precisely  
what users submit to the validator, a sample which I think may be  
fairly different from any sample of web pages in the wild. But the  
latter is already very interesting.

[Bug85] http://www.w3.org/Bugs/Public/show_bug.cgi?id=85

> - What sort of questions would people like to see answered about  
> trends of validated documents, other than "how many pages validate",  
> or "what is the most frequent error type".

I know I'm answering a bit late, so feel free to disregard or keep for  
a later installment of your survey...

* stats on the documents themselves. Doctype, mime type, charset.  
Ideally, whether charset is in HTTP, XML decl, meta. There are  
existing studies about these, but another study made on a different  
sample would bring more perspective.

* precise values for the error messages. Knowing which type of error   
is "popular" will be very useful, but so would knowing what the  
offending attributes/element/construct. In other words, knowing that  
"unknown attribute" is the #1 error will be great – knowing that the  
top unknown attributes are frameborder or whatnot will be awesome.

> - How and to whom should the results be presented?
>   (to the list? other interested parties?

To this list, I think. I know of at least a few interested parties,  
who'd like to know the top error messages to prioritize a blossoming  
translation effort, but they can be pointed to the message on the  
list. Wherever the results are sent, I'll definitely feature them on  
the W3C Questions&Answers blog, too.
http://www.w3.org/QA/

Cheers,
-- 
olivier

Received on Wednesday, 6 February 2008 04:25:33 UTC