- From: Terje Bless <link@pobox.com>
- Date: Sun, 23 May 2004 01:20:01 +0200
- To: QA-dev <public-qa-dev@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ville Skyttä <ville.skytta@iki.fi> wrote: >It's a bit hard to find the interesting entries since validator is quite >an errorlog-trasher (still, even though I managed to get some of the >noisiest bugs fixed for 0.6.6). And we should probably make an effort to reduce this problem even further fairly quickly. ><http://www.w3.org/TR/query-semantics/> (~1.6MB) [170MB] ><http://www.w3.org/TR/2003/WD-xsl11-20031217/> (~1.8MB) [107MB] ><http://www.go-mono.com/[…].Windows.Forms.html> (~0.9MB) [141MB] > >Normal, smallish validation cases seem to take 10MB or so per >"check" process on my box, so 100+ MB is pretty much... ideas? Process size will balloon with input document size (and hence complexity) since each element has a gazillion attributes who will show up in the ESIS whether they're in the physical markup or not. A normal document has a very large markup:content ratio; the cited documents have inordinatly much markup compared to the amount of data in them. Well, or at least that's my theory. :-) BTW, Björn has (on IRC) just suggested some optimizations that can be used to avoid some of this overhead in a number of cases. I'll have a look at whether that can reasonably be done for 0.6.7. The bug on this has been targetted for 0.7 IIRC. >Running "top" on v.w.o suggests that it seems to kill the "check" >process once its footprint reaches 100MB when validating any of the >above URLs. I did not see any related configuration or limits in >httpd.conf, and the box does not run out of memory or anything. Which means these are probably either Apache compile-time limits or Debian kernel ulimits. >There is also one 500 from what is apparently caused by someone >repeatedly (7ish times) clicking the referer badge in the lower right >hand corner of the results page after having validated a pretty large >document with show source and show parse tree options on, causing >ovbiously pretty heavy recursion and an URL with length of about 2k... >any ideas how we could prevent this? Look for the User-Agent or similar distinguishing characteristic of the incoming request, and if it's ourselves we append an extra token ("recursive") to out User-Agent string. If a request comes in with "recursive" we throw a fatal error. Add in a configurable prmitted recursion level perhaps... - -- "Temper Temper! Mr. Dre? Mr. NWA? Mr. AK, comin´ straight outta Compton and y'all better make way?" -- eminem -----BEGIN PGP SIGNATURE----- Version: PGP SDK 3.0.3 iQA/AwUBQK/gHqPyPrIkdfXsEQLL5QCg1HJZgRVZhZtOEDaQ1B1Qwkrf4F0An3U4 SWGhS3bzWDuWdgTEBRlHLNo7 =6FgA -----END PGP SIGNATURE-----
Received on Saturday, 22 May 2004 19:20:05 UTC