- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: 25 Jul 2003 21:39:05 +0300
- To: www-validator@w3.org
On Fri, 2003-07-25 at 11:16, Centaur zeus wrote: [...] > I found that it actually parsed two documents, one is the one I requested > and another is the one of the html link. [...] > 1) Why the html link is parsed again ? There are 2 situations where the linked documents need to be parsed: 1) Recursive checking. Obviously, if in recursion, the linked documents need to be fetched and parsed in full to extract other links from them. 2) The links contain fragments, eg. <http://foo/bar#quux>. To check whether there is an ID "quux" in the linked document, it needs to be fetched and parsed. > 2) is it appropriate to change if (being_processed) to if (0) and what's the > impact ? The ability to check fragments' "validity" would be gone. > 3) How can I minimize the resource used by the LWP and HTTP package ? Use nice(1) :) If you're looking into optimizing the code, one possibility would be to avoid instantiation of W3C::UserAgent and W3C::CheckLink objects. Instantiating these also means instantiating eg. new HTML::Parser objects etc. Since checklink doesn't operate in parallel, one UserAgent and one CheckLink instance could be enough for one run of the script. And looking into checking links in parallel would probably result in quicker completion times, though most likely at the expense of somewhat bigger resource usage. If you roll up your sleeves and do some work, please don't hesitate to send patches! Cheers, -- \/
Received on Friday, 25 July 2003 14:39:08 UTC