Validation of long-lines with non ASCII characters


I have some problems while trying to validate some pages containing non
ASCII characters (like éàèöäü), especially on long lines. The example 
using is

First of all, when a line is longer than 70 characters, it is truncated,
but the column number that is printed is the column in the truncated 
(which is always 50), not in the original line, and this is not very
useful. The patch below should correct that. It also remove an 
"if" in the truncate_line function -- $diff is defined as being $col-50,
and so $col will always be bigger than $diff.

The problem that remains is the conversion done by the transcode 
Before calling this function, one of the lines with an error in the URL
above looks like [yes, I know, the French sentence below is full of
grammatical errors -- I didn't write it]

> Techniquement le genre problème apporter par le e-vote à été déjà [...]

After transcoding, the string stored in memory is

> Techniquement le genre problème apporter par le e-vote à été déjà 
> [...]

This is correctly displayed on the validation report, but the column
numbers are not correct anymore (because some characters that used to 
1 space in memory now use 2), and the "^" symbol used to show in which
column the error appeared is meaningless !

Any idea on how to correct this ?


--- check.old	2002-11-05 11:39:44.000000000 +1100
+++ check	2002-11-05 11:46:30.000000000 +1100
@@ -1279,11 +1281,7 @@
        if (length $line == 70 + 4) {
  	$line .= " ...";
-      if ($col > $diff) {
-	$col -= $diff;
-      } else {
-	$col -= 70;
-      }
+      $col -= $diff;
      } else { # Truncate both sides; leave more on left, and 30 chars 
on right.
        if ($col < 35) {
  	$line = "... " . substr($line, 0, 60);
@@ -1484,7 +1484,7 @@
      $line = &ent($line); # Entity encode.
      $line =~ s/\t/ /g;   # Collapse TABs.

-    print qq(  <li><em>Line <a 
href="#line-$err->{line}">$err->{line}</a>, column $col</em>: );
+    print qq(  <li><em>Line <a 
href="#line-$err->{line}">$err->{line}</a>, column $err->{char}</em>: );
      print qq{<span class="msg">$err->{msg}</span>};
      if (defined $CFG->{'Error to URI'}->{$err->{idx}}) {
        print qq{ (<a href="$CFG->{'Msg FAQ URI'}#$CFG->{'Error to 

Received on Monday, 4 November 2002 22:08:49 UTC