W3C home > Mailing lists > Public > www-validator@w3.org > March 2002

New set of patches for 0.6.0

From: Ville Skyttä <ville.skytta@iki.fi>
Date: 01 Apr 2002 00:40:38 +0300
To: www-validator@w3.org
Cc: link@pobox.com
Message-Id: <1017610838.18308.100.camel@bobcat.ods.org>
Terje, thanks for looking into my patches again; here's some comments
and a new set of them.

> * base-css.patch
> - Applied (apart from the style for "label" that I forgot,
>   but will be in the next checkin).

Not necessary, I think, see the updated HTML patch below.

> * check-cfg.patch:
> - Applied and expanded on.

Ok.  There may be something weird in the CVS, check.cfg and tips.cfg are
shown in the Attic?  However, checkout of validator-0_6_0 works ok and
fetches these files too.
http://dev.w3.org/cvsweb/validator/htdocs/config/?hideattic=0&only_with_tag=validator-0_6_0

> * check-protocols.patch:
> - Applied.
> - Also led to some infrastructure work that was much needed.
>   I reworked your patch slightly to fit into this new scheme.

I realized that this patch was pretty ugly and insufficient, and
accidentally found out that the same functionality is actually in
libwww-perl 5.53_94 and newer.  The new version below still takes
advantage of the new "exception framework", but also adds a
configuration option for allowed protocols, defaulting to HTTP and
HTTPS.

Also, a similar implementation for checklink.pl, using
$ua->protocols_allowed() would clean it up somewhat.

Quick comments about the new patches:
- check-html.patch: HTML related fixes.
- check-paths.patch: Portability patch.
- check-protocols.patch: Cleaner and better supported/allowed protocol
handling.
- checklink-options.patch: Improved command line option handling.
- checklink-cleanup.patch: General cleanups and HTML fixes.
- also available from http://cachalot.ods.org/patches/w3c-validator/ and
http://koti.welho.com/vskytta/patches/w3c-validator/

Then, a few comments to the TODO list at
http://validator.w3.org:8001/todo.html
[0.6.0, 1.]
- I tried the absolute URI fixup a few months ago to make validator
easier to install locally, and found it generally possible, but the
currently used SSI's are a problem.  Also the http://foo/check/referer
needs to be taken care of (at least if there is a "Revalidate" link on
the page).

[Misc, 25,33.]
- SSL/TLS and the error count seem to be already covered by recent
updates.

[Misc, new entry]
- Check if this needs a fix:
http://lists.w3.org/Archives/Public/www-validator/2002Mar/0055.html

Cheers,
-- 
Ville Skyttä
ville.skytta@iki.fi

Index: httpd/cgi-bin/check
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/check,v
retrieving revision 1.200.2.17
diff -a -u -r1.200.2.17 check
--- httpd/cgi-bin/check	2002/03/31 07:04:07	1.200.2.17
+++ httpd/cgi-bin/check	2002/03/31 16:49:39
@@ -100,7 +100,8 @@
 $File->{'Footer'} = &prepSSI({
 			      File => $CFG->{'Footer'},
 			      Date => q$Date: 2002/03/31 07:04:07 $,
-			     });
+			     })
+  . "  </body>\n</html>\n";
 
 #
 # Prepare standard HTML preamble for output.
@@ -277,16 +278,18 @@
 # Print different things if we got redirected or had a file upload.
 #if (URI::eq("$File->{Opt}->{URI}", $q->param('uri'))) { # @@@FIXME@@@: Temorarily Broken. :-)
 if (TRUE) { # @@ Need to stringify here?
+  my $size = (length($File->{Opt}->{URI}) || 38) + 2;
+  $size = 70 if $size > 70;
   &add_table($File, qq(<label title="Address of Page to Validate" for="uri">Address</label>),
-	     '<input type="text" id="uri" name="uri" size="'
-	     . (length($File->{Opt}->{URI}) + 2)
+	     '<input type="text" id="uri" name="uri" size="' . $size
 	     . '" value="' . $File->{Opt}->{URI} . '" />');
 } elsif ($q->param('uploaded_file')) {
   &add_table($File, "File", $File->{Opt}->{URI});
 } else {
+  my $size = (length($File->{Opt}->{URI}) || 38) + 2;
+  $size = 70 if $size > 70;
   &add_table($File, qq(<label title="Address of Page to Validate" for="uri"><a href="http://www.w3.org/Addressing/#terms">URI</a></label>),
-	     '<input type="text" id="uri" name="uri" size="'
-	     . (length($File->{Opt}->{URI}) + 2)
+	     '<input type="text" id="uri" name="uri" size="' . $size
 	     . '" value="' . $File->{Opt}->{URI} . '" />'
              . ' [<a href="' . $File->{Opt}->{URI} . '">Go to URI</a>]');
   &add_warning($File, '<em>Note:</em> The URI you gave me, &#171;<code>' .
@@ -300,7 +303,7 @@
 &add_table($File, "Size", $File->{Size})  if $File->{Size};
 unless ($File->{Opt}->{'Is Upload'}) {
   &add_table($File,
-	     qq(<label title="Character Encoding" for="charset">Character&nbsp;Encoding</label>),
+	     qq(<label title="Character Encoding" for="charset">Character Encoding</label>),
 	     $File->{Use_Charset} . ' ' .
 	     &CGI::popup_menu(-name => 'charset', -id => 'charset',
 			    -values => [
@@ -378,7 +381,7 @@
       To assure correct validation, processing, and display,
       it is important that the character encoding is properly
       labeled.
-      <a href='http://www.w3.org/International/O-charset.html'>Further
+      <a href="http://www.w3.org/International/O-charset.html">Further
       explanations</a>.
 EOHD
   $File->{Tentative} |= T_CHARSET_KLUDGE; # WOuld be T_WARN, but the complaints...
@@ -489,7 +492,7 @@
 }
 
 my @cmd = ($CFG->{'SGML Parser'}, '-c', $catalog, '-E0', @xmlflags);
-&add_table($File, "commandline", "<code>@cmd</code>") if $DEBUG;
+&add_table($File, "Command Line", "<code>@cmd</code>") if $DEBUG;
 
 #
 # Temporary filehandles.
@@ -553,7 +556,6 @@
 
 
 my $fpi;
-$File->{Version} = 'unknown';
 if ($File->{Type} eq 'xhtml' or $File->{Type} eq 'mathml' or $File->{Type} eq 'svg' or $File->{Type} eq 'smil') {
   $fpi = $File->{DOCTYPE};
 } elsif ($File->{Type} eq 'xml') {
@@ -569,7 +571,7 @@
 }
 $File->{Version} = $CFG->{'FPI to Text'}->{$fpi} || 'unknown';
 
-&add_table($File, "Document Type",
+&add_table($File, "<label for=\"doctype\">Document Type</label>",
 	   $File->{Version} . ' ' .
 	   &CGI::popup_menu(
 			    -name => 'doctype',
@@ -594,25 +596,25 @@
     &add_warning($File, "Unknown namespace (&#171;<code>$File->{Namespace}</code>&#187;) for text/html document!");
     if ($File->{Namespace} ne '') {
       &add_table($File, "Root Namespace",
-	"<a href='$File->{Namespace}'>$File->{Namespace}</a>");
+	"<a href=\"$File->{Namespace}\">$File->{Namespace}</a>");
     }
   } elsif ($File->{Type} eq 'svg' and $File->{Namespace} ne 'http://www.w3.org/2000/svg') {
     &add_warning($File, "Unknown namespace (&#171;<code>$File->{Namespace}</code>&#187;) for SVG document!");
     if ($File->{Namespace} ne '') {
       &add_table($File, "Root Namespace",
-	"<a href='$File->{Namespace}'>$File->{Namespace}</a>");
+	"<a href=\"$File->{Namespace}\">$File->{Namespace}</a>");
     }
   } else {
     if ($File->{Namespace} ne '') {
       &add_table($File, "Root Namespace",
-	"<a href='$File->{Namespace}'>$File->{Namespace}</a>");
+	"<a href=\"$File->{Namespace}\">$File->{Namespace}</a>");
     }
   }
 
   if (scalar keys %{$File->{Namespaces}} > 1) {
     my $namespaces = "<ul>";
     for (keys %{$File->{Namespaces}}) {
-      $namespaces .= "\t<li><a href='$_'>$_</a></li>\n"
+      $namespaces .= "\t<li><a href=\"$_\">$_</a></li>\n"
           unless $_ eq $File->{Namespace}; # Don't repeat Root Namespace.
     }
     &add_table($File, "Other Namespaces", $namespaces . "</ul>");
@@ -704,13 +706,13 @@
   my $tableEntry;
   unless ($File->{Opt}->{URI} =~ m(^upload://)) {
     add_table($File, '<input type="submit" value="Revalidate" />',
-	      '         <label title="Show Page Source"><input type="checkbox" value="" name="ss"' .
+	      '         <label class="nowrap" title="Show Page Source"><input type="checkbox" value="" name="ss"' .
 	      ($File->{Opt}->{'Show Source'}      ? 'checked="checked"' : '') . " />Show Source</label>\n" .
-	      '         <label title="Show an Outline of the document"><input type="checkbox" value="" name="outline"' .
+	      '         <label class="nowrap" title="Show an Outline of the document"><input type="checkbox" value="" name="outline"' .
 	      ($File->{Opt}->{'Outline'} ? 'checked="checked"' : '') . " />Outline</label>\n" .
-	      '         <label title="Show Parse Tree"><input type="checkbox" value="" name="sp"' .
+	      '         <label class="nowrap" title="Show Parse Tree"><input type="checkbox" value="" name="sp"' .
 	      ($File->{Opt}->{'Show Parsetree'}      ? 'checked="checked"' : '') . " />Parse Tree</label>\n" .
-	      '         <label title="Exclude Attributes from Parse Tree"><input type="checkbox" value="" name="noatt"' .
+	      '         <label class="nowrap" title="Exclude Attributes from Parse Tree"><input type="checkbox" value="" name="noatt"' .
 	      ($File->{Opt}->{'No Attributes'}   ? 'checked="checked"' : '') . " />...no attributes</label>\n"
 	     );
   }
@@ -718,11 +720,11 @@
     unless $File->{Opt}->{URI} =~ m(^upload://);
   print qq(<table class="header">\n);
   unless ($File->{'Is Valid'}) {
-    &add_table($File, 'Errors', scalar(@{$File->{Errors}}) . ' (aproximate)');
+    &add_table($File, 'Errors', scalar(@{$File->{Errors}}) . ' (approximate)');
   }
   for $tableEntry (@{$File->{Table}}) {
     print "    <tr>\n";
-    print ' ' x 6, "<th>", $$tableEntry{Head}, ": </th>\n";
+    print ' ' x 6, "<th class=\"nowrap\">", $$tableEntry{Head}, ": </th>\n";
     print ' ' x 6, "<td>", $$tableEntry{Tail}, "</td>\n";
     print "    </tr>\n";
   }
@@ -759,7 +761,7 @@
     </p>
 
     <pre>
-      &lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"&gt;
+      &lt;!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"&gt;
       &lt;html&gt;
 	&lt;head&gt;
 	  &lt;title&gt;Title&lt;/title&gt;
@@ -780,7 +782,7 @@
   my $File = shift;
   my ($dieMessage) = shift;
   print <<"EOF";
-    <hr>
+    <hr />
     <strong class="error">Internal server error ($dieMessage).</strong>
     Please contact <a href="mailto:$CFG->{Maintainer}">maintainer</a>.
 EOF
@@ -836,7 +838,7 @@
 
 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd">
-<html>
+<html lang="en">
   <head><title>401 Authorization Required</title></head>
   <body>
     <h1>Authorization Required</h1>
@@ -1315,7 +1317,7 @@
     print qq(<span class="markup">^</span></pre>\n);
   }
   print "</ul>\n";
-  print "<hr>\n";
+  print "<hr />\n";
   if ($File->{Version} eq 'unknown') {
     print "<p>Sorry, I can't validate this document.</p>";
   } elsif ($File->{Type} eq 'xml') {
@@ -1396,7 +1398,7 @@
     if (defined $image_uri) {
       print <<"EOHD";
   <p>
-    <img src="$image_uri" alt="$alttext"$gifhw> Congratulations, this
+    <img src="$image_uri" alt="$alttext"$gifhw /> Congratulations, this
     document validates as $File->{Version}!
   </p>
 
@@ -1470,7 +1472,7 @@
 
   print <<'EOF';
   <div id="outline" class="mtb">
-    <hr>
+    <hr />
     <h2><a name="outline">Outline</a></h2>
     <p>
       Below is an outline for this document, automatically generated from the
@@ -1538,7 +1540,7 @@
 
   print <<'EOF';
   <div id="source" class="mtb">
-    <hr>
+    <hr />
     <h2><a name="source">Source Listing</a></h2>
 
     <p>Below is the source input I used for this validation:</p>
@@ -1560,7 +1562,7 @@
 
   print <<'EOF';
   <div id="parse" class="mtb">
-    <hr>
+    <hr />
     <h2><a name="parse">Parse Tree</a></h2>
 EOF
   if ($File->{Opt}->{'No Attributes'}) {
@@ -1687,7 +1689,7 @@
 sub show_esis ($) {
   print <<'EOF';
   <div id="raw_esis" class="mtb">
-    <hr>
+    <hr />
     <h2><a name="raw_esis">Raw ESIS Output</a></h2>
     <pre>
 EOF
@@ -1704,7 +1706,7 @@
 sub show_errors ($) {
   print <<'EOF';
   <div id="raw_errors" class="mtb">
-    <hr>
+    <hr />
     <h2><a name="raw_errors">Raw Error Output</a></h2>
     <pre>
 EOF
Index: htdocs/base.css
===================================================================
RCS file: /sources/public/validator/htdocs/base.css,v
retrieving revision 1.20.2.5
diff -a -u -r1.20.2.5 base.css
--- htdocs/base.css	2002/03/31 07:04:07	1.20.2.5
+++ htdocs/base.css	2002/03/31 16:49:39
@@ -45,6 +45,10 @@
 
 .hideme {display: none}
 
+.nowrap {
+  white-space: nowrap;
+}
+
 #Notice {
   width: 75ex;
   border: solid;
@@ -139,7 +143,7 @@
 }
 
 /* Various header(ish) things. Definitions cribbed from the CORE Styles. */
-h1	{	
+h1	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	2em;
@@ -151,8 +155,8 @@
 	text-transform:	none;
 	color: #005A9C;
 	}
-		
-h2	{	
+
+h2	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	1.75em;
@@ -162,9 +166,9 @@
 	word-spacing:	normal;
 	letter-spacing:	normal;
 	text-transform:	none;
-	}	
-		
-h3	{	
+	}
+
+h3	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	1.58em;
@@ -174,9 +178,9 @@
 	word-spacing:	normal;
 	letter-spacing:	normal;
 	text-transform:	none;
-	}	
-		
-h4	{	
+	}
+
+h4	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	1.33em;
@@ -186,9 +190,9 @@
 	word-spacing:	normal;
 	letter-spacing:	normal;
 	text-transform:	none;
-	}	
-		
-h5, dt	{	
+	}
+
+h5, dt	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	1.17em;
@@ -199,9 +203,9 @@
 	letter-spacing:	normal;
 	text-transform:	none;
 	margin-top: 1em;
-	}	
-		
-h6	{	
+	}
+
+h6	{
 	font-family:	Tahoma, Verdana, Myriad Web, Syntax, sans-serif;
 	font-size-adjust:	.53;
 	font-size:	1em;
@@ -211,6 +215,6 @@
 	word-spacing:	normal;
 	letter-spacing:	normal;
 	text-transform:	none;
-	}	
+	}
 
 

Index: httpd/cgi-bin/check
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/check,v
retrieving revision 1.200.2.17
diff -a -u -r1.200.2.17 check
--- httpd/cgi-bin/check	2002/03/31 07:04:07	1.200.2.17
+++ httpd/cgi-bin/check	2002/03/31 13:52:37
@@ -34,8 +34,8 @@
 use IO::File;
 use Text::Iconv; # on debian: apt-get install libtext-iconv-perl
 use HTML::Parser 3.25; # Need 3.25 for $p->ignore_elements.
+use File::Spec ();
 
-
 ###############################################################################
 #### Constant definitions. ####################################################
 ###############################################################################
@@ -454,32 +454,33 @@
 }
 
 my @xmlflags = '-wnon-sgml-char-ref';
-my $catalog  = $CFG->{'SGML Library'} . '/catalog';
+my $catalog  = File::Spec->catfile($CFG->{'SGML Library'}, 'catalog');
 
 if ($File->{Type} eq 'xhtml') {
-  $catalog               = $CFG->{'SGML Library'} . '/xhtml.soc';
+  $catalog         = File::Spec->catfile($CFG->{'SGML Library'}, 'xhtml.soc');
   $ENV{SP_CHARSET_FIXED} = 'YES';
   $ENV{SP_ENCODING}      = 'UTF-8';
   @xmlflags              = '-wxml';
 } elsif ($File->{Type} eq 'svg') {
-  $catalog               = $CFG->{'SGML Library'} . '/svg.soc';
+  $catalog         = File::Spec->catfile($CFG->{'SGML Library'}, 'svg.soc');
   $ENV{SP_CHARSET_FIXED} = 'YES';
   $ENV{SP_ENCODING}      = 'UTF-8';
   @xmlflags              = '-wxml';
 } elsif ($File->{Type} eq 'smil') {
-  $catalog               = $CFG->{'SGML Library'} . '/smil.soc';
+  $catalog         = File::Spec->catfile($CFG->{'SGML Library'}, 'smil.soc');
   $ENV{SP_CHARSET_FIXED} = 'YES';
   $ENV{SP_ENCODING}      = 'UTF-8';
   @xmlflags              = '-wxml';
 } elsif ($File->{Type} eq 'mathml') {
-  $catalog               = $CFG->{'SGML Library'} . '/mathml.soc';
+  $catalog         = File::Spec->catfile($CFG->{'SGML Library'}, 'mathml.soc');
   $ENV{SP_CHARSET_FIXED} = 'NO';
   $ENV{SP_ENCODING}      = 'XML';
   @xmlflags              = '-wxml';
 } elsif ($File->{Type} eq 'xml' or $File->{Namespace}) {
   # no doctype, with xmlns attr on 1st element
   $File->{Type} = 'xml'; # @@ probably a better way to do this
-  $catalog               = $CFG->{'SGML Library'} . '/sp-1.3/pubtext/xml.soc';
+  $catalog         = File::Spec->catfile($CFG->{'SGML Library'},
+                                         'sp-1.3', 'pubtext', 'xml.soc');
   $ENV{SP_CHARSET_FIXED} = 'YES';
   $ENV{SP_ENCODING}      = 'XML';
   @xmlflags              = '-wxml';

Index: httpd/cgi-bin/check
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/check,v
retrieving revision 1.200.2.17
diff -a -u -r1.200.2.17 check
--- httpd/cgi-bin/check	2002/03/31 07:04:07	1.200.2.17
+++ httpd/cgi-bin/check	2002/03/31 15:45:33
@@ -24,7 +24,8 @@
 ###############################################################################
 
 use strict;
-use LWP::UserAgent;
+use LWP::UserAgent 1.90 (); # Need 1.90 for protocols_(allowed|forbidden)
+use HTTP::Request ();
 use URI;
 use URI::Escape;
 use CGI::Carp;
@@ -147,13 +148,7 @@
 } elsif ($q->param('fragment')) {
   $File = &handle_frag($q, $File);
 } elsif ($q->param('uri')) {
-  my $rejected = &uri_rejected($File);
-  if ($rejected) {
-    $File->{'Error Flagged'} = TRUE;
-    $File->{'Error Message'} = $rejected;
-  } else {
-    $File = &handle_uri($q, $File);
-  }
+  $File = &handle_uri($q, $File);
 }
 
 #
@@ -186,7 +181,7 @@
 }
 
 #
-# Abort if an error is flagged. (only from &uri_rejected() so far).
+# Abort if an error is flagged. (only from &handle_uri() so far).
 if ($File->{'Error Flagged'}) {
   print $File->{'Results'};
   print $File->{'Error Message'};
@@ -984,9 +979,14 @@
       next if /^\s*$/;
       next if /^\s*\#/;
       chomp;
-      my($k, $v) = split /\t+/, $_;
+      my($k, $v) = split(/\t+/, $_, 2);
       $v = '' unless defined $v;
-      $v = &read_cfg($v) if $v =~ s(^file://){};
+      if ($v =~ s(^file://){}) {
+        $v = &read_cfg($v);
+      } elsif ($v =~ /,/) {
+        my @vals = split(/,/, $v);
+        $v = \@vals;
+      }
       $cfg{$k} = $v;
     }
     undef $fh;
@@ -1002,11 +1002,25 @@
   my $q    = shift; # The CGI object.
   my $File = shift; # The master datastructure.
 
-  my $uri = $q->param('uri'); # The URI to fetch.
+  my $uri = URI->new($q->param('uri')); # The URI to fetch.
 
   my $ua = new LWP::UserAgent;
   $ua->agent("W3C_Validator/$VERSION " . $ua->agent);
   $ua->parse_head(0);  # Parse the http-equiv stuff ourselves. @@ Why?
+
+  # @@@FIXME@@@:
+  #   Disable checking if the URI is local (or private) for security reasons,
+  #   or at least make it configurable to do so.
+  #   eg. /^(localhost(\.localdomain)?|127\..+)$/ (+ private networks)
+  #
+  $ua->protocols_allowed($CFG->{'Allowed Protocols'} || ['http', 'https']);
+
+  if (! $ua->is_protocol_supported($uri)) {
+    $File->{'Error Flagged'} = TRUE;
+    $File->{'Error Message'} = &uri_rejected($uri);
+    return $File;
+  }
+
   my $req = new HTTP::Request(GET => $uri);
 
   # If we got a Authorization header, the client is back at it after being
@@ -1022,7 +1036,8 @@
       &authenticate($File, $res->request->url, $res->www_authenticate);
     } else {
       print $File->{Results};
-      &print_unknown_http_error_message($uri, $res->code, $res->message);
+      &print_unknown_http_error_message($uri->as_string, $res->code,
+                                        $res->message);
     }
     print $File->{'Footer'};
     exit;
@@ -1774,38 +1789,13 @@
 }
 
 
-#
-# Check if the URI looks like one we want to retrieve.
 #
-# Returns FALSE if URI is ok, or an error message suitable for output.
-#
-# @@@FIXME@@@:
-#   Disable checking if the URI is local (or private) for security reasons,
-#   or at least make it configurable to do so.
-#   eg. /^(localhost(\.localdomain)?|127\.)$/ (+ private networks)
-#
+# Output errors for a rejected URI.
 sub uri_rejected {
-  my $File    = shift;
-  my $uri     = URI->new($File->{Opt}->{URI});
-  my $scheme  = $uri->scheme();
-
-  my($handler, $ret);
-
-  #
-  # If we don't have an SSL implementation for use with LWP, CGI::Carp bombs
-  # here and the script dies. So, we need to "local"ize $SIG{__DIE__} here.
-  eval {local $SIG{__DIE__}; $handler = LWP::Protocol::implementor($scheme)};
-
-  # No implementor for the scheme, or a (disabled) "file://" URI.
-  if ($@ or $scheme eq 'file' or $handler eq 'LWP::Protocol::file') {
-    undef $handler;
-  }
+  my $uri = shift;
+  my $scheme = $uri ? $uri->scheme() || 'undefined' : 'undefined';
 
-  if (defined $handler) {
-    return FALSE;
-  } else {
-    $scheme = $scheme || 'undefined';
-    $ret =  <<"EOF";
+  return <<"EOF";
     <div class="error">
       <p>
         Sorry, this type of <a
@@ -1824,6 +1814,4 @@
       </p>
     </div>
 EOF
-  return $ret;
-  }
 }
Index: htdocs/config/check.cfg
===================================================================
RCS file: /sources/public/validator/htdocs/config/Attic/check.cfg,v
retrieving revision 1.1.2.4
diff -a -u -r1.1.2.4 check.cfg
--- htdocs/config/check.cfg	2002/03/30 22:45:45	1.1.2.4
+++ htdocs/config/check.cfg	2002/03/31 15:45:33
@@ -58,3 +58,7 @@
 Header		/usr/local/validator/htdocs/header.html
 Footer		/usr/local/validator/htdocs/footer.html
 
+
+#
+# Allowed protocols, comma-separated.
+Allowed Protocols	http,https

Index: httpd/cgi-bin/checklink.pl
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/checklink.pl,v
retrieving revision 2.89
diff -a -u -r2.89 checklink.pl
--- httpd/cgi-bin/checklink.pl	2002/02/01 21:29:09	2.89
+++ httpd/cgi-bin/checklink.pl	2002/03/31 20:14:22
@@ -1,4 +1,4 @@
-#! /usr/bin/perl -w
+#!/usr/bin/perl -w
 #
 # W3C Link Checker
 # by Hugo Haas <hugo@w3.org>
@@ -104,7 +104,7 @@
         printf("\n%s\n", &global_stats());
     }
 } else {
-    use CGI;
+    use CGI ();
     use CGI::Carp qw(fatalsToBrowser);
     $query = new CGI;
     # Set a few parameters in CGI mode
@@ -272,7 +272,7 @@
 				for example, it would be:
 				http://www.w3.org/TR/html4/
 	-n/--noacclanguage	Do not send an Accept-Language header.
-	-L/--languages		Languages accepted (default: '$_languages'). 
+	-L/--languages		Languages accepted (default: '$_languages').
 	-q/--quiet		No output if no errors are found.
 	-v/--verbose		Verbose mode.
 	-i/--indicator		Show progress while parsing.
@@ -301,7 +301,7 @@
 
 sub ask_password() {
     print(STDERR 'Enter the password for user '.$_user.': ');
-    # Will only work on Unix...
+    # Will only work on Unix... Term::ReadKey from CPAN would be better.
     system('stty -echo');
     chomp($_password = <STDIN>);
     system('stty echo');
@@ -315,7 +315,7 @@
 ###########################################
 
 sub urize() {
-    use URI;
+    use URI ();
     $_ = URI::Escape::uri_unescape($_[0]);
     my $base;
     my $res = $_;
@@ -375,7 +375,7 @@
     if ($_html) {
         print("</h2>\n");
         if (! $_summary) {
-            use URI;
+            use URI::Escape ();
             printf("<p>Go to <a href='#%s'>the results</a>.</p>\n",
                    $result_anchor);
             printf("<p>Check also:
@@ -393,7 +393,7 @@
     $processed{$absolute_uri} = 1;
     # Parse the document
     my $p = &parse_document($uri, $absolute_uri,
-                            $response->content(), 1, 
+                            $response->content(), 1,
 			    $depth != 0);
     my $base = URI->new($p->{base});
 
@@ -689,7 +689,7 @@
 
 sub W3C::UserAgent::redirect_ok {
     my ($self, $request) = @_;
-    
+
     if (! ($_summary || (!$doc_count && $_html))) {
         &hprintf("\n%s %s ", $request->method(), $request->uri());
     }
@@ -775,7 +775,7 @@
             print "\n";
         }
         return &get_uri($method, $response->request->url,
-                        $start, $ua->{Redirects}, 
+                        $start, $ua->{Redirects},
                         $code, $realm, $message, 1);
     }
     # Record the redirects
@@ -1077,7 +1077,6 @@
 ################################
 
 sub check_validity() {
-    use HTTP::Status;
     my ($testing, $uri, $links, $redirects) = @_;
     # $testing is the URI of the document checked
     # $uri is the URI of the target that we are verifying
@@ -1163,10 +1162,10 @@
 }
 
 sub escape_match($, \%) {
-    use URI::Escape;
-    my ($a, $hash) = (uri_unescape($_[0]), $_[1]);
+    use URI::Escape ();
+    my ($a, $hash) = (URI::Escape::uri_unescape($_[0]), $_[1]);
     foreach $b (keys %$hash) {
-        if ($a eq uri_unescape($b)) {
+        if ($a eq URI::Escape::uri_unescape($b)) {
             return(1);
         }
     }
@@ -1181,15 +1180,15 @@
     my $r = $_[0];
     $r->headers->www_authenticate =~ /Basic realm=\"([^\"]+)\"/;
     my $realm = $1;
-    my $authHeader = $r->headers->www_authenticate;                                 
+    my $authHeader = $r->headers->www_authenticate;
     if ($_cl) {
         printf(STDERR "\nAuthentication is required for %s.\n", $r->request->url);
         printf(STDERR "The realm is %s.\n", $realm);
         print(STDERR "Use the -u and -p options to specify a username and password.\n");
     } else {
-        printf("Status: 401 Authorization Required\nWWW-Authenticate: %s\nConnection: close\nContent-Type: text/html\n\n", $r->headers->www_authenticate);
+        printf("Status: 401 Authorization Required\nWWW-Authenticate: %s\nConnection: close\nContent-Language: en\nContent-Type: text/html\n\n", $r->headers->www_authenticate);
         printf("<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">
-<html>
+<html lang=\"en\">
 <head>
 <title>401 Authorization Required</title>
 </head>
@@ -1207,7 +1206,7 @@
 ##################
 
 sub get_timestamp() {
-    use Time::HiRes;
+    use Time::HiRes ();
     return pack('LL', Time::HiRes::gettimeofday());
 }
 
@@ -1359,7 +1358,7 @@
                     $whattodo =
 'You must change this link: people using a browser without Javascript support
 will <em>not</em> be able to follow this link. See the
-<a href="http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505/#tech-scripts">Web 
+<a href="http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505/#tech-scripts">Web
 Content Accessibility Guidelines on the use of scripting on the Web</a> and
 the
 <a href="http://www.w3.org/TR/WCAG10-HTML-TECHS/#directly-accessible-scripts">techniques
@@ -1409,7 +1408,7 @@
             }
             printf("
 <dt%s>%s</dt>
-<dd>What to do: <strong%s>%s</strong>%s<br>
+<dd>What to do: <strong%s>%s</strong>%s<br></dd>
 <dd>HTTP Code returned: %d%s<br>
 HTTP Message: %s%s%s</dd>
 <dd>Lines: %s</dd>\n",
@@ -1439,7 +1438,7 @@
                    # HTTP original message
                    defined($results->{$u}{location}{orig_message})
                    ? &encode($results->{$u}{location}{orig_message}).
-                   ' <span title="redirected to">-&gt;</span> ' 
+                   ' <span title="redirected to">-&gt;</span> '
                    : '',
                    # HTTP final message
                    $http_message,
@@ -1504,7 +1503,7 @@
             }
         }
     }
-    # End of the table 
+    # End of the table
     if ($_html) {
        print("</dl>\n");
     }
@@ -1663,7 +1662,7 @@
     my $stop = &get_timestamp();
     return sprintf("Checked %d document(s) in %ss.",
                    ($doc_count<=$_max_documents? $doc_count : $_max_documents),
-                   &time_diff($timestamp, $stop)); 
+                   &time_diff($timestamp, $stop));
 }
 
 ##################
@@ -1683,7 +1682,7 @@
     print "
 
 <!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\">
-<html>
+<html lang=\"en\">
 <head>
 <title>W3C".$title."</title>
 <style type=\"text/css\">
@@ -1694,8 +1693,8 @@
   background: white;
 }
 
-pre {
-  font-family: monospace
+pre, code, tt {
+  font-family: monospace;
 }
 
 img {
@@ -1730,7 +1729,7 @@
 </style>
 </head>
 <body>
-<a href=\"http://www.w3.org/\"><img alt=\"W3C\" src=\"http://www.w3.org/Icons/w3c_home\" height=\"48\" width=\"72\"></a>
+<p><a href=\"http://www.w3.org/\" title=\"W3C\"><img alt=\"W3C\" src=\"http://www.w3.org/Icons/w3c_home\" height=\"48\" width=\"72\"></a></p>
 <h1>W3C<sup>&reg;</sup>".$title."</h1>
 \n";
 }
@@ -1783,9 +1782,8 @@
 <a href=\"http://www.w3.org/2000/07/checklink\">documentation</a>.
 Download the
 <a href=\"http://dev.w3.org/cvsweb/~checkout~/validator/httpd/cgi-bin/checklink.pl?rev=".$REVISION."&amp;content-type=text/plain\">source
-code</a> from the
-<a href=\"http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/checklink.pl\">CVS
-log</a>.
+code</a> from
+<a href=\"http://dev.w3.org/cvsweb/validator/httpd/cgi-bin/checklink.pl\">CVS</a>.
 </address>
 </body>
 </html>
@@ -1806,22 +1804,22 @@
     my ($q) = @_;
     &html_header('', 1);
     print "<form action=\"".$q->self_url()."\" method=\"get\">
-<p>Enter the address (<a href='http://www.w3.org/Addressing/#terms'>URL</a>)
-of a document that you would like to check:</p>
-<p><input type=\"text\" size=\"50\" name=\"uri\"></p>
+<p><label for=\"uri\">Enter the address (<a href='http://www.w3.org/Addressing/#terms'>URL</a>)
+of a document that you would like to check:</label></p>
+<p><input type=\"text\" size=\"50\" id=\"uri\" name=\"uri\"></p>
 <p>Options:</p>
 <p>
-  <input type=\"checkbox\" name=\"summary\"> Summary only
+  <label><input type=\"checkbox\" name=\"summary\"> Summary only</label>
   <br>
-  <input type=\"checkbox\" name=\"hide_redirects\"> Hide redirects
+  <label><input type=\"checkbox\" name=\"hide_redirects\"> Hide redirects</label>
   <br>
-  <input type=\"checkbox\" name=\"no_accept_language\"> Don't send <tt>Accept-Language</tt> headers.
+  <label><input type=\"checkbox\" name=\"no_accept_language\"> Don't send <tt>Accept-Language</tt> headers</label>
   <br>
-  <input type=\"checkbox\" name=\"hide_dir_redirects\"> Hide directory redirects
+  <label><input type=\"checkbox\" name=\"hide_dir_redirects\"> Hide directory redirects</label>
   <br>
-  <input type=\"checkbox\" name=\"recursive\"> Check linked documents recursively <small>(maximum: $_max_documents documents; sleeping $_sleep_time seconds between each document)</small>
+  <label><input type=\"checkbox\" name=\"recursive\"> Check linked documents recursively <small>(maximum: $_max_documents documents; sleeping $_sleep_time seconds between each document)</small></label>
   <br>
-  Depth of the recursion: <input type=\"text\" size=\"3\" name=\"depth\"><small>(-1 is the default and means unlimited)</small>
+  <label>Depth of the recursion: <input type=\"text\" size=\"3\" name=\"depth\"><small>(-1 is the default and means unlimited)</small></label>
 </p>
 <p><input type=\"submit\" name=\"submit\" value=\"Check\"></p>
 </form>

Index: httpd/cgi-bin/checklink.pl
===================================================================
RCS file: /sources/public/validator/httpd/cgi-bin/checklink.pl,v
retrieving revision 2.89
diff -a -u -r2.89 checklink.pl
--- httpd/cgi-bin/checklink.pl	2002/02/01 21:29:09	2.89
+++ httpd/cgi-bin/checklink.pl	2002/03/29 17:32:50
@@ -84,12 +84,12 @@
 if ($#ARGV >= 0) {
     $_cl = 1;
 # Parse command line
-    my @uris = &parse_arguments();
+    &parse_arguments();
     if ($_user && (! $_password)) {
         &ask_password();
     }
     my $uri;
-    foreach $uri (@uris) {
+    foreach $uri (@ARGV) {
 	if (!$_summary) {
             printf("%s %s\n", $PROGRAM ,$VERSION) if (! $_html);
         } else {
@@ -163,95 +163,44 @@
 ################################
 
 sub parse_arguments() {
-    my @uris;
-    my $uris = 0;
-    while (@ARGV) {
-        $_ = shift(@ARGV);
-        if ($uris) {
-            push(@uris, $_);
-        } elsif (m/^--$/) {
-            $uris = 1;
-        } elsif (m/^-[^-DupytdlL]/) {
-            if (m/q/) {
-                $_quiet = 1;
-                $_summary = 1;
-            }
-            if (m/s/) {
-                $_summary = 1;
-            }
-            if (m/b/) {
-                $_redirects = 0;
-            }
-            if (m/e/) {
-                $_dir_redirects = 0;
-            }
-            if (m/v/) {
-                $_verbose = 1;
-            }
-            if (m/i/) {
-                $_progress = 1;
-            }
-            if (m/h/) {
-                $_html = 1;
-            }
-            if (m/n/) {
-                $_accept_language = 0;
-            }
-            if (m/r/) {
-                if ($_depth == 0) {
-                    $_depth = -1;
-                }
-            }
-        } elsif (m/^--help$/) {
-            &usage();
-        } elsif (m/^--quiet$/) {
-            $_quiet = 1;
-        } elsif (m/^--summary$/) {
-            $_summary = 1;
-        } elsif (m/^--broken$/) {
-            $_redirects = 0;
-        } elsif (m/^--dir-redirects$/) {
-            $_dir_redirects = 0;
-        } elsif (m/^--verbose$/) {
-            $_verbose = 1;
-        } elsif (m/^--indicator$/) {
-            $_progress = 1;
-        } elsif (m/^--html$/) {
-            $_html = 1;
-        } elsif (m/^--noacclanguage$/) {
-            $_accept_language = 0;
-        } elsif (m/^--recursive$/) {
-            if ($_depth == 0) {
-                $_depth = -1;
-            }
-        } elsif (m/^-l|--location$/) {
-            $_base_location = shift(@ARGV);
-        } elsif (m/^-u|--user$/) {
-            $_user = shift(@ARGV);
-        } elsif (m/^-p|--password$/) {
-            $_password = shift(@ARGV);
-        } elsif (m/^-t|--timeout$/) {
-            $_timeout = shift(@ARGV);
-        } elsif (m/^-L|--languages$/) {
-            $_languages = shift(@ARGV);
-        } elsif (m/^-D|--depth$/) {
-            my $value = shift(@ARGV);
-            $_depth = $value unless($value == 0);
-        } elsif (m/^-d|--domain$/) {
-            $_trusted = shift(@ARGV);
-        } elsif (m/^-y|--proxy$/) {
-            $_http_proxy = shift(@ARGV);
-        } elsif (m/^--masquerade$/) {
-            $_masquerade = 1;
-            $_local_dir = shift(@ARGV);
-            $_remote_masqueraded_uri = shift(@ARGV);
-        } elsif (m/^--hide-same-realm$/) {
-            $_hide_same_realm = 1;
-        } else {
-            push(@uris, $_);
-        }
+
+    use Getopt::Long 2.17 qw(GetOptions);
+    Getopt::Long::Configure('no_ignore_case');
+    my @masq = ();
+
+    GetOptions('help'            => \&usage,
+               'q|quiet'         => sub { $_quiet = 1; $_summary = 1; },
+               's|summary'       => \$_summary,
+               'b|broken'        => sub { $_redirects = 0; },
+               'e|dir-redirects' => sub { $_dir_redirects = 0; },
+               'v|verbose'       => \$_verbose,
+               'i|indicator'     => \$_progress,
+               'h|html'          => \$_html,
+               'n|noacclanguage' => sub { $_accept_language = 0; },
+               'r|recursive'     => sub { $_depth = -1 if $_depth == 0; },
+               'l|location=s'    => \$_base_location,
+               'u|user=s'        => \$_user,
+               'p|password=s'    => \$_password,
+               't|timeout=i'     => \$_timeout,
+               'L|languages=s'   => \$_languages,
+               'D|depth=i'       => sub { $_depth = $_[1] unless $_[1] == 0; },
+               'd|domain=s'      => \$_trusted,
+               'y|proxy=s'       => \$_http_proxy,
+               'masquerade'      => \@masq,
+               'hide-same-realm' => \$_hide_same_realm,
+               'V|version'       => \&version,
+              );
+
+    if (@masq) {
+        $_masquerade = 1;
+        $_local_dir = shift(@masq);
+        $_remote_masqueraded_uri = shift(@masq);
     }
-    return(@uris);
+}
+
+sub version() {
+    print STDERR "$PROGRAM $VERSION\n";
+    exit(0);
 }
 
 sub usage() {
@@ -278,7 +227,7 @@
 	-i/--indicator		Show progress while parsing.
 	-u/--user username	Specify a username for authentication.
 	-p/--password password	Specify a password.
-	--hide-same-real	Hide 401's that are in the same realm as the
+	--hide-same-realm	Hide 401's that are in the same realm as the
 				document checked.
 	-t/--timeout value	Timeout for the HTTP requests.
 	-d/--domain domain	Regular expression describing the domain to
@@ -290,6 +239,7 @@
 	-y/--proxy proxy	Specify an HTTP proxy server.
 	-h/--html		HTML output.
 	--help			Show this message.
+	-V/--version		Output version information.
 
 Documentation at: http://www.w3.org/2000/07/checklink
 Please send bug reports and comments to the www-validator mailing list:
Received on Sunday, 31 March 2002 16:41:03 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:01 GMT