Michael Adams wrote: [ discussing the error message ...] >>> The error was: utf8 "\x80" does not map to Unicode > \x80 is illegal as a first byte in unicode. First of all, this relates to UTF-8 encoding only. Second, you're right in the sense that byte 80 is not allowed as the first byte of the encoding of character in UTF-8. I was confused when I wrote that it must be _followed_ by a byte pattern of a specific kind; instead, it must appear _within_ a byte combination of a certain kind. Anyway, the error message is wrong. The byte 80 occurring in UTF-8 data stream surely "maps to Unicode" as part of byte patterns. A correct error message would be "The error was: Byte 80 (hexadecimal) found in purported UTF-8 data in a context where it is not allowed." This is fairly generic of course, but I suppose the error message pattern is generic as well, so we cannot assume that it's about occurrences as first bytes only. -- Yucca, http://www.cs.tut.fi/~jkorpela/Received on Friday, 14 November 2008 15:29:06 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:33 GMT