- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Fri, 22 Oct 2004 13:42:42 +0200
- To: dankogai@dan.co.jp
- Cc: www-archive@w3.org
* Dan wrote: >> % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" >> >> does not work as expected (it should print "Bj\x{FFFD}rn") which is >> apparently due to Encode::utf8::decode_xs(), the code > >In this particular case, your expectation is wrong. Try > >perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))' > >and it works as expected. > >You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not. No, you misread the bug report, I expect that perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rnx))" behave the same in that the malformed sequence \xF6 gets replaced by U+FFFD as documented in `perldoc Encode` for check = Encode::FB_DEFAULT. Encode::utf8::decode_xs() fails to do that for the reason outlined in my bug report so the current result is Bj Bj\x{FFFD}rnx it should be Bj\x{FFFD}rn Bj\x{FFFD}rnx I fail to see what this has to do with how Perl treats the string as from a Perl perspective there is no real difference here, Perl works as expected, decode() does not. (I've posted this to RT but it again does not show up there, see http://lists.w3.org/Archives/Public/www-archive/2004Oct/0044.html).
Received on Friday, 22 October 2004 11:43:22 UTC