Re: checklink: suppress expected errors to avoid false positive warnings

On Friday 17 October 2008, Michael Ernst wrote:
> Sometimes, a user expects that checklink will produce certain warnings.
> Some reasons include robot exclusion rules, password-protected content, and
> errors in automatically-generated content.
> A user would prefer checklink to show only the unexpected warnings, rather
> than hiding them in an avalance of uninteresting output.
> This patch adds flags that suppress certain warnings.  These flags
> complement the existing --exclude and --exclude-docs flags.  (The patch
> also permits --exclude-docs to be supplied multiple times instead of just
> once.)

Thanks for the patch!  Some comments follow.  (I don't mind discussing these 
things here on the www-validator mailing list, but I think a better suited 
place would be either the public-qa-dev mailing list or W3C Bugzilla).

Because the patch contains two different things (modification of existing 
exclude-docs functionality, and addition of new options), could you split it 
into two patches?  I hope that's the way it'd also be eventually committed to 
CVS - it's easier to track changes that way.  We can eg. first get the 
exclude-docs change in, then the rest.

The patch appears to drop precompilation and error repoting of the 
exclude-docs regexp.  I don't think that's a good idea for two reasons.  
First, doing the compilation right at the beginning we get the regexp's 
syntax checked right there and can abort immediately with a descriptive 
message instead of running into it later during the check (when the use might 
no longer be actively watching the link check progress) and barfing with a 
more obscure error message.  Second, precompiling it only once at the 
beginning is good for performance.

Same considerations as above seem to apply to the exclude-redirect-prefix 

I think options that can be specified multiple times should be initialized to 
an empty array ([]) instead of undef, for cleanliness reasons and because 
that way there's no need to check their definedness later on.

I don't like the wildly varying separator characters in option values (->, :, 
#).  Better would be consistent, and we already have the space char used 
in --masquerade so I suggest using space for the new options as well.

In addition to the --help output, bin/checklink.pod in CVS needs to be updated 

