Re: Invalid Bytes for Charset from Michael Adams on 2008-11-15 (www-validator@w3.org from November 2008)

From: Michael Adams <linux_mike@paradise.net.nz>
Date: Sat, 15 Nov 2008 22:42:34 +1300
To: www-validator@w3.org
Message-id: <20081115224234.13b5ecaa.linux_mike@paradise.net.nz>

On Fri, 14 Nov 2008 17:28:09 +0200
Came this utterance fomulated by Jukka K. Korpela to my mailbox:

> 
> Michael Adams wrote:
> 
> [ discussing the error message ...]
> >>> The error was: utf8 "\x80" does not map to Unicode
> 
> > \x80 is illegal as a first byte in unicode.
> 
> First of all, this relates to UTF-8 encoding only.
> 
> Second, you're right in the sense that byte 80 is not allowed as the
> first byte of the encoding of character in UTF-8. I was confused when
> I wrote that it must be _followed_ by a byte pattern of a specific
> kind; instead, it must appear _within_ a byte combination of a certain
> kind.
> 
> Anyway, the error message is wrong. The byte 80 occurring in UTF-8
> data stream surely "maps to Unicode" as part of byte patterns. A
> correct error message would be "The error was: Byte 80 (hexadecimal)
> found in purported UTF-8 data in a context where it is not allowed."
> This is fairly generic of course, but I suppose the error message
> pattern is generic as well, so we cannot assume that it's about
> occurrences as first bytes only.
> 

I like it, other than the word 'Purported' which is not an easy word for
those with little English. How about "The error is: Byte 80
(hexadecimal) is not allowed here in UTF-8 data."


-- 
Michael

All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416

Received on Saturday, 15 November 2008 09:39:57 UTC