W3C home > Mailing lists > Public > public-qa-dev@w3.org > January 2006

Some Markup Validator statistics

From: olivier Thereaux <ot@w3.org>
Date: Tue, 31 Jan 2006 11:18:23 +0900
Message-Id: <55b6bc8abdb647d73ddec6aca520ef15@w3.org>
To: QA Dev <public-qa-dev@w3.org>

After some discussion off-list with Antonio, I did some very rough log 
analysis for the markup validator, checking basically the relative 
weight of the different validation methods, as well as the relative 
frequency of docs and feedback reading versus validation requests.

My sample was based on not even a couple of days, for only one server, 
so perhaps not a very representative sample of what the validator sees 
in a month. However the sample has around 300,000 requests to check, 
which is certainly enough to draw some conclusions.

Conclusion 1: the most requested resources are the CSS and 
header/footer images. I wonder to which extent, if we were to want to 
reduce the load, merging the CSS into one file would help (for the 
load, but be more hassle for the maintenance of the stylesheets).

Conclusion 2: in terms of "real" resource, the check script is the 
overwhelming #1. For the ~ 273,000 check requests, there were only ~ 
1000 hits to the docs (with, probably, some cacheing going on - the 
homepage got ~36000 hits in that period), 8000 for the feedback form, 
6000 for checklink.

The most interesting stat, however, may be the ratio between GET and 
POST requests (validation by URI and otherwise). Validation by URI 
accounts for 75% of the requests (with about 8% being actually 
/check/referer requests, so revalidations through the icons) and 
upload/direct approximately 25%. It's harder to know which of these are 
uploads or direct input validations, but referer info tells us that 17% 
of requests come with referer validator.w3.org, so the last 8 are 
mozilla/opera gizmo. I am almost certain that close to all of these 17% 
are file upload requests, but have no way of knowing for sure.

Received on Tuesday, 31 January 2006 02:18:31 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:54:50 UTC