any idea why validator(-jp) would repeatedly fetch XHTML DTDs?

Hi all,

On friday last week we (w3c systems) checked usage of www and  
validator web servers, trying to find heavy users. In that process,  
we managed to find a machine that was used as a proxy to DOS  
validator-jp, and had it closed.

Among the results was one surprising fact, though: validator.w3.org  
seems to be one of these very heavy users of www.w3.org, repeatedly  
fetching DTDs. Both -us and -jp servers do so, in a way more visible  
for validator -jp because it "talks" only to one www.w3.org mirror  
(the Japanese one) whereas validator-us talks to all of them.

Attached is a sample log from the www server. As you can see,  
although the rate of requests is much smaller than the rate at which  
the validator server receives requests for validation (so it's  
certainly not a massive failure of the sgml-lib catalogs), but still  
rather heavy, especially for modular DTDs.

My first hunch was that openSP would ignore its catalog for documents  
having only a SYSTEM doctype, and thus fetching the DTDs, but a  
number of tests, including the "custom DTD" test case from
http://validator-jp.w3.org/check?uri=http%3A%2F%2Fqa-dev.w3.org%2Fwmvs 
%2FHEAD%2Fdev%2Ftests%2Fsgml_customdtd.html&charset=%28detect 
+automatically%29&doctype=Inline&ss=1
showed no sign of any DTD download.

I tried to monitor at the same time the logs of validator-jp and the  
Japanese www.w3.org mirror, looking for a pattern in what documents  
triggered DTD, and found 1) no pattern and 2) that revalidating these  
documents did not trigger any DTD download.

Count me as puzzled on that one. Any idea whether this is a normal or  
abnormal libosp behavior?

If noone has any idea I'll send a mail to the opensp list.
Thanks.
-- 
olivier

Received on Monday, 1 May 2006 03:59:12 UTC