- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Mon, 24 Oct 2005 09:34:52 +0300
- To: QA-dev Dev <public-qa-dev@w3.org>
On Mon, 2005-10-24 at 08:18 +0900, olivier Thereaux wrote: > I recall that the link checker is based on HTML::Parser, and that's > the base object used to parse the documents and extract the links. I > noticed recently that there were a couple of libs that we may want to > use instead, such as HTML::LinkExtor (actually a subclass of > HTML::Parser) or HTML::LinkExtractor. > > Does anyone remember if these have already been considered, and if > yes, why we chose not to use them? I dimly remember having a look at those some time ago. There are at least a couple of things worth noting: neither provides any line/column number locator information, and neither deals with anchors. Both of these could be probably taken care of through subclassing, but we'd need to do some parsing ourselves anyway, so the savings from the outsourcing might not be that big. On a semi-related note, HTML::Parser 3.19_94 and later have "line" and "column" argspecs which I think could be used instead of counting lines ourselves.
Received on Monday, 24 October 2005 06:34:56 UTC