W3C home > Mailing lists > Public > www-archive@w3.org > October 2004

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 22 Oct 2004 13:42:42 +0200
To: dankogai@dan.co.jp
Cc: www-archive@w3.org
Message-ID: <419ff195.627828880@smtp.bjoern.hoehrmann.de>

* Dan wrote:
>>   % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
>> does not work as expected (it should print "Bj\x{FFFD}rn") which is
>> apparently due to Encode::utf8::decode_xs(), the code
>In this particular case, your expectation is wrong.  Try
>perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))'
>and it works as expected.
>You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not.

No, you misread the bug report, I expect that

  perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
  perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))"

behave the same in that the malformed sequence \xF6 gets replaced by
U+FFFD as documented in `perldoc Encode` for check = Encode::FB_DEFAULT.
Encode::utf8::decode_xs() fails to do that for the reason outlined in my
bug report so the current result is


it should be


I fail to see what this has to do with how Perl treats the string as
from a Perl perspective there is no real difference here, Perl works
as expected, decode() does not.

(I've posted this to RT but it again does not show up there, see
Received on Friday, 22 October 2004 11:43:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:32:34 UTC