Re: checklink: suppress expected errors to avoid false positive warnings from Ville Skyttä on 2008-10-17 (www-validator@w3.org from October 2008)

From: Ville Skyttä <ville.skytta@iki.fi>
Date: Fri, 17 Oct 2008 21:05:03 +0300
To: www-validator@w3.org
Cc: Michael Ernst <mernst@alum.mit.edu>
Message-Id: <200810172105.04006.ville.skytta@iki.fi>

On Friday 17 October 2008, Ville Skyttä wrote:
> On Friday 17 October 2008, Michael Ernst wrote:
> > Sometimes, a user expects that checklink will produce certain warnings.
> > Some reasons include robot exclusion rules, password-protected content,
> > and errors in automatically-generated content.
> >
> > A user would prefer checklink to show only the unexpected warnings,
> > rather than hiding them in an avalance of uninteresting output.
> >
> > This patch adds flags that suppress certain warnings.  These flags
> > complement the existing --exclude and --exclude-docs flags.  (The patch
> > also permits --exclude-docs to be supplied multiple times instead of just
> > once.)
>
> Thanks for the patch!  Some comments follow.

A couple of more things:

I think --exclude-* is not necessarily the best prefix for these options (nor 
Exclude_* in code).  We already have --exclude and --exclude-docs which do 
exclude certain links or documents from being checked altogether.  The new 
options being added in your patch do not exclude things in the same sense - I 
think --suppress-* (and Suppress_* in code) could be more appropriate.

Instead of adding a bunch of --suppress options I'd personally rather see 
a "strictly one line per warning/error" output mode added.  Ideally it would 
be formatted akin to how various compilers format their error/warning 
messages - various editors understand that can eg. jump to the line where the 
error occurred easily (admittedly this is most useful for local file based 
link checks only).  For example something like (where XX is the line number):

http://source-url-of-doc/:XX: W: warning message goes here
http://source-url-of-doc/:XX: E: error message goes here

From such an output people could easily filter out things they're not 
interested in, for example using grep, or we could add only one 
generic --suppress option that would just filter the output messages based on 
given regexps.  Even easier if the warning/error messages were using some 
kind of error ids (like broken-link-404 http://.../, broken-link-500 
http://.../, directory-redirect http://.../ -> http://.../, etc) instead of 
natural language.  I think this output mode would be a lot more work than 
adding the few suggested options though.  The options can be added now and 
possibly deprecated/removed later if something better emerges - think of this 
just as food for thought.

Received on Friday, 17 October 2008 18:05:38 UTC