W3C home > Mailing lists > Public > www-validator@w3.org > November 2008

Re: Invalid Bytes for Charset

From: Michael Adams <linux_mike@paradise.net.nz>
Date: Sat, 15 Nov 2008 22:42:34 +1300
To: www-validator@w3.org
Message-id: <20081115224234.13b5ecaa.linux_mike@paradise.net.nz>

On Fri, 14 Nov 2008 17:28:09 +0200
Came this utterance fomulated by Jukka K. Korpela to my mailbox:

> Michael Adams wrote:
> [ discussing the error message ...]
> >>> The error was: utf8 "\x80" does not map to Unicode
> > \x80 is illegal as a first byte in unicode.
> First of all, this relates to UTF-8 encoding only.
> Second, you're right in the sense that byte 80 is not allowed as the
> first byte of the encoding of character in UTF-8. I was confused when
> I wrote that it must be _followed_ by a byte pattern of a specific
> kind; instead, it must appear _within_ a byte combination of a certain
> kind.
> Anyway, the error message is wrong. The byte 80 occurring in UTF-8
> data stream surely "maps to Unicode" as part of byte patterns. A
> correct error message would be "The error was: Byte 80 (hexadecimal)
> found in purported UTF-8 data in a context where it is not allowed."
> This is fairly generic of course, but I suppose the error message
> pattern is generic as well, so we cannot assume that it's about
> occurrences as first bytes only.

I like it, other than the word 'Purported' which is not an easy word for
those with little English. How about "The error is: Byte 80
(hexadecimal) is not allowed here in UTF-8 data."


All shall be well, and all shall be well, and all manner of things shall
be well

 - Julian of Norwich 1342 - 1416
Received on Saturday, 15 November 2008 09:39:57 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:57 UTC