Re: Error message for invalid UTF-8 overlong forms should be improved from olivier Thereaux on 2008-05-29 (www-validator@w3.org from May 2008)

From: olivier Thereaux <ot@w3.org>
Date: Thu, 29 May 2008 12:57:13 -0400
To: Jukka K.Korpela <jkorpela@cs.tut.fi>
Cc: Thomas Rutter <tom@thomasrutter.com>, Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>, W3C Validator Community <www-validator@w3.org>
Message-Id: <6E2BDFB9-787B-4D9A-94B6-759F7B4AFCEC@w3.org>

On 29-May-08, at 2:14 AM, Jukka K. Korpela wrote:
> That's inconsistent indeed, and the more I think of it, the more
> misleading this "utf8 "\x..." does not map to Unicode" thing looks  
> like.
> It is difficult to express concisely that data that has been  
> declared or
> assumed to be utf-8 encoded violates the rules of utf-8 and cannot  
> thus
> be interpreted as characters. But the current formulation is  
> misleading
> and even plain wrong, at least in the first case.

This error message (and the decoding of the bytes as utf-8) all are  
part of the Encode perl module.

http://search.cpan.org/dist/Encode/

The validator can of course work around issues, and rewrite messages  
from modules it uses, but if indeed there are issues/suggestions, it  
may be worth reporting them upstream.

http://rt.cpan.org/Public/Dist/Display.html?Name=Encode

Thanks,
olivier
-- 
olivier Thereaux - W3C - http://www.w3.org/People/olivier
W3C Open Source Software : http://www.w3.org/Status

Received on Thursday, 29 May 2008 16:57:48 UTC