- From: Ville Skyttä <ville.skytta@iki.fi>
- Date: Sat, 28 Aug 2004 12:16:58 +0300
- To: QA-dev <public-qa-dev@w3.org>
On Sat, 2004-08-28 at 04:30, Martin Duerst wrote: > Hello Bjoern, > > At 07:48 04/08/27 +0200, Bjoern Hoehrmann wrote: > >* Martin Duerst wrote: > > >I'm planning to work a bit on the link checker in the next few days, > > >to make it comply with the IRI spec. > > > >My understanding is that checklink only supports HTML and XHTML 1.x > >documents, > > Yes. But I think it would be fairly easy to extend it to parse > other things, such as SVG,... I have actually some initial work already done wrt modularization and support for more document types for the link checker. I will post more details soon, but my basic idea is to provide an event driven API (found links and fragments/anchors are reported as "events", akin to SAX), and I have some crude but already at least partially working implementations supporting XML Base, XLink, XInclude, xml:id and some initial work on links in CSS. There are bits here and there that are of general nature and would be (and are) best placed in generic CPAN modules. The XML things are currently implemented as XML::SAX compliant filters, so it'll be trivial to plug them into the filter chain of whatever app is using XML::SAX. There's at least one thing though that SAX filtering alone does not seem to be suitable for in the context of recursive link checking (nor does the current link checker code). If we want to avoid fetching documents (possibly) multiple times and support "dynamic" stuff like XPointer (when used with anything more complex than #xpointer(id('foo'))), that would AFAICT require us to store the target document or its DOM tree or something for the duration of the recursive run. I have not thought about this too much yet, so comments and ideas are very much welcome. Another thing that is somewhat a dark area to me is parsing for example XML Schemas in order to find out what exactly is a link or an anchor for "unknown" document types, and what is the relation of this in the link checker wrt the Markup Validator. > >Is there any chance you could implement whatever you had in mind here > >as new stand-alone Perl modules, either in the W3C::* namespace or > >probably even better in the more general CPAN namespaces (HTML::, URI::, > >etc.)? It seems these would be of a mostly more general nature and > >likely to be re-used by other tools, that's quite difficult to do with > >inline code, and checklink is already > 2000 lines of code, we should > >try to avoid adding significantly more code to it. +1000 > I was myself quite frightened of the checklink code up to a few days > ago. I'm now quite a bit less after I have looked through it a few times > on the bus. [I don't in any way claim I understand it yet.] > For what I'm planning for the link checker at the moment, I'm not > sure that will become a module. But it's possible to think about > how to move that code, or similar code, I am pretty much familiar with it already, but some parts of it never stop frightening me :) I think a somewhat thorough rewrite is be a good idea, but I also think spending time getting familiar with the current code is not necessarily in vain. Setting up a Wiki page for documenting the ideas and comments for $TNV could be a good idea.
Received on Saturday, 28 August 2004 09:17:03 UTC