W3C home > Mailing lists > Public > www-validator@w3.org > April 2004

Re: [markup validator] source quoting i18n bug?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 24 Apr 2004 19:16:35 +0200
To: olivier Thereaux <ot@w3.org>
Cc: validators community <www-validator@w3.org>, Martin Duerst <duerst@w3.org>
Message-ID: <409a9dcc.141964003@smtp.bjoern.hoehrmann.de>

* olivier Thereaux wrote:
>Typical test case: validating the validation output for a shift_jis  
>encoded page (in my case, the google.co.jp homepage)
>
>Symptom: in its error output, the validator quotes part of the source  
>for the validated page.

>I am far from being an expert on that part of the code, but it seems  
>like a typical i18n problem.

Yes, this is documented in the source (for truncate_line):

[...]
  # This *really* wants Perl 5.8.0 and it's improved UNICODE support.
  # Byte semantics are in effect on all length(), substr(), etc. calls,
  # so offsets will be wrong if there are multi-byte sequences prior to
  # the column where the error is detected.
[...]

There are various means to tell Perl the result of iconv is UTF-8, see
`perldoc perlunicode`/"Porting code from perl-5.6.X".
Received on Saturday, 24 April 2004 13:17:00 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:13 GMT