Re: Irritating feature in Validator from Martin Duerst on 2001-10-28 (www-validator@w3.org from October 2001)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 28 Oct 2001 23:39:37 +0900
To: Michael Everson <everson@evertype.com>
Cc: www-validator@w3.org
Message-Id: <4.2.0.58.J.20011028232608.03a76eb0@localhost>

At 12:37 01/10/28 +0000, Michael Everson wrote:

>As it happens, the UTF8 error was in a line further on, where it talks 
>about quotation marks and lists the left and right double angle quotes ォ 
>and サ. I fixed that UTF-8 but the point is that if you have a UTF-8 error 
>the validator just says what line it is in and doesn't provide you with 
>marked up text, which it does for invalid characters in, say, Latin 1.

I went back to the code (mostly mine) and checked, but exactly the same
thing is done for conversion errors from Latin-1 to UTF-8 as for
UTF-8 byte sequence errors.

It may be that you mean errors such as &#130;. These are not Latin-1
errors, and are not related to the character encoding used for the page.
These are markup errors, and are detected in a completely different
part of the code.

Other than that, the only thing I can think of currently is that
you are comparing with an older version of the validator. Older
versions indeed didn't check character encoding and were relying
on ad-hoc errors produced by SP. In some cases, that lead to
a huge list of errors (e.g. for a Shift_JIS page), while other
errors were not caught. So we decided to just give a list of
line numbers, because when something goes wrong with character
encoding, it goes wrong quite a bit.

Anyway, you can always check the 'show source code' box to
get a source code listing with line numbers.

If that doesn't help, you should try to set up two dummy pages
that show the two behaviors that are different but you think
should be the same.

Regards,   Martin.

Received on Sunday, 28 October 2001 09:39:49 UTC