- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Fri, 14 Nov 2008 17:28:09 +0200
- To: <www-validator@w3.org>
Michael Adams wrote: [ discussing the error message ...] >>> The error was: utf8 "\x80" does not map to Unicode > \x80 is illegal as a first byte in unicode. First of all, this relates to UTF-8 encoding only. Second, you're right in the sense that byte 80 is not allowed as the first byte of the encoding of character in UTF-8. I was confused when I wrote that it must be _followed_ by a byte pattern of a specific kind; instead, it must appear _within_ a byte combination of a certain kind. Anyway, the error message is wrong. The byte 80 occurring in UTF-8 data stream surely "maps to Unicode" as part of byte patterns. A correct error message would be "The error was: Byte 80 (hexadecimal) found in purported UTF-8 data in a context where it is not allowed." This is fairly generic of course, but I suppose the error message pattern is generic as well, so we cannot assume that it's about occurrences as first bytes only. -- Yucca, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 14 November 2008 15:29:06 UTC