W3C home > Mailing lists > Public > www-archive@w3.org > October 2004

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 22 Oct 2004 13:42:42 +0200
To: dankogai@dan.co.jp
Cc: www-archive@w3.org
Message-ID: <419ff195.627828880@smtp.bjoern.hoehrmann.de>

* Dan wrote:
>>   % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
>> 
>> does not work as expected (it should print "Bj\x{FFFD}rn") which is
>> apparently due to Encode::utf8::decode_xs(), the code
>
>In this particular case, your expectation is wrong.  Try
>
>perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))'
>
>and it works as expected.
>
>You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not.

No, you misread the bug report, I expect that

  perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))"
  perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))"

behave the same in that the malformed sequence \xF6 gets replaced by
U+FFFD as documented in `perldoc Encode` for check = Encode::FB_DEFAULT.
Encode::utf8::decode_xs() fails to do that for the reason outlined in my
bug report so the current result is

  Bj
  Bj\x{FFFD}rnx

it should be

  Bj\x{FFFD}rn
  Bj\x{FFFD}rnx

I fail to see what this has to do with how Perl treats the string as
from a Perl perspective there is no real difference here, Perl works
as expected, decode() does not.

(I've posted this to RT but it again does not show up there, see
http://lists.w3.org/Archives/Public/www-archive/2004Oct/0044.html).
Received on Friday, 22 October 2004 11:43:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 7 November 2012 14:17:46 GMT