Re: checklink: ideas for next version from olivier Thereaux on 2009-01-02 (www-validator@w3.org from January 2009)

From: olivier Thereaux <ot@w3.org>
Date: Fri, 2 Jan 2009 16:28:11 -0500
To: Ville Skyttä <ville.skytta@iki.fi>
Cc: www-validator@w3.org
Message-Id: <7834D2BC-E524-468E-B7A0-05A187B37A6B@w3.org>

Hi Ville, Hi all.

On 1-Jan-09, at 4:53 PM, Ville Skyttä wrote:
> Anyway, I'm cutting a lot of your plan here with just saying it  
> sounds fine to
> me.  However, I think the overdue version 4.4 should really be  
> released
> before going wild with implementing the new plans in CVS (or  
> alternatively,
> do the new work on a branch).  I'm not aware of any release blockers  
> at the
> moment, do you remember any?

No big blocker as far as I can remember, and seeing as we don't yet  
have anything in CVS for the new ideas, now seems like a good time for  
a release. I'd like to take a couple hours next week to go through  
some testing and a run through bugzilla, and perhaps make a couple of  
minor UI tweaks (plus adding a mention of the validators sponsorship/ 
donation program) and we're good to go.

http://rt.cpan.org/Public/Dist/Display.html?Name=W3C-LinkChecker
http://www.w3.org/Bugs/Public/buglist.cgi?query_format=advanced&product=LinkChecker&order=bugs.bug_status

> Regarding link checker future, I'd personally actually prefer
> redesigning/rewriting much of the current code rather than  
> continuing too
> long with the current implementation.  The script is quite a monster  
> already
> and requires quite intimate knowledge to maintain/contribute to -  
> cleaner
> codebase and proper separation of concerns would make many things  
> much easier
> and could attract more contributors.  And perhaps while at it,  
> consider
> changing the implementation to e.g. Java or Python.

Refactoring would obviously be good, and we have indeed talked before  
of making a better usage of modular code (which would result in  
refactoring). Changing language is an interesting question.

On the one hand perl is not necessarily the most popular language in  
the block today, and switching to php or python or ruby would perhaps  
be a better incentive for today's web developers to participate in the  
project.

On the other hand, the link checker does rely a fair bit on a number  
of perl libraries, so any change should make sure we wouldn't have to  
reimplement all those. I know python would fare decently there, with  
urllib(2) for fetching, beautifulsoup or html5lib for parsing, and  
robotparser (or Philip's rewrite - http://nikitathespider.com/python/rerp/ 
  ) for the robots.txt part.

On the other-other hand (if you are an octopus) we are limited in some  
ways by those perl libraries: the RobotUA module is why we can't have  
wait time under 1s between links, integration of parallelUA has been a  
hurdle noone passed, etc. Switching to another language might get us  
out of these issues (and create others).

-- 
olivier

Received on Friday, 2 January 2009 21:28:21 UTC