W3C home > Mailing lists > Public > www-validator@w3.org > November 2002

Validation of long-lines with non ASCII characters

From: Frederic Schutz <schutz@mathgen.ch>
Date: Mon, 4 Nov 2002 20:23:32 -0500 (EST)
To: www-validator@w3.org
Message-Id: <E629C858-F06B-11D6-AC84-000393BAB03A@mathgen.ch>




Hello,

I have some problems while trying to validate some pages containing non
ASCII characters (like ), especially on long lines. The example 
I'm
using is http://www.linux-gull.ch/evote/index.html

First of all, when a line is longer than 70 characters, it is truncated,
but the column number that is printed is the column in the truncated 
line
(which is always 50), not in the original line, and this is not very
useful. The patch below should correct that. It also remove an 
unnecessary
"if" in the truncate_line function -- $diff is defined as being $col-50,
and so $col will always be bigger than $diff.

The problem that remains is the conversion done by the transcode 
function.
Before calling this function, one of the lines with an error in the URL
above looks like [yes, I know, the French sentence below is full of
grammatical errors -- I didn't write it]

> Techniquement le genre problme apporter par le e-vote  t dj [...]

After transcoding, the string stored in memory is

> Techniquement le genre problème apporter par le e-vote à été déj 
> [...]

This is correctly displayed on the validation report, but the column
numbers are not correct anymore (because some characters that used to 
take
1 space in memory now use 2), and the "^" symbol used to show in which
column the error appeared is meaningless !

Any idea on how to correct this ?

Frdric

--- check.old	2002-11-05 11:39:44.000000000 +1100
+++ check	2002-11-05 11:46:30.000000000 +1100
@@ -1279,11 +1281,7 @@
        if (length $line == 70 + 4) {
  	$line .= " ...";
        }
-      if ($col > $diff) {
-	$col -= $diff;
-      } else {
-	$col -= 70;
-      }
+      $col -= $diff;
      } else { # Truncate both sides; leave more on left, and 30 chars 
on right.
        if ($col < 35) {
  	$line = "... " . substr($line, 0, 60);
@@ -1484,7 +1484,7 @@
      $line = &ent($line); # Entity encode.
      $line =~ s/\t/ /g;   # Collapse TABs.

-    print qq(  <li><em>Line <a 
href="#line-$err->{line}">$err->{line}</a>, column $col</em>: );
+    print qq(  <li><em>Line <a 
href="#line-$err->{line}">$err->{line}</a>, column $err->{char}</em>: );
      print qq{<span class="msg">$err->{msg}</span>};
      if (defined $CFG->{'Error to URI'}->{$err->{idx}}) {
        print qq{ (<a href="$CFG->{'Msg FAQ URI'}#$CFG->{'Error to 
URI'}->{$err->{idx}}">explain...</a>).};
Received on Monday, 4 November 2002 22:08:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:04 GMT