- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 03 Sep 2004 10:41:20 +0900
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- Cc: public-qa-dev@w3.org
At 11:04 04/09/02 +0900, Martin Duerst wrote: >> >>It would also be good if you could implement any transcoding stuff, etc. >> >>in a way compatible with perlunicode, setting the UTF-8 flag etc. > >I have started working on that. I had to test what Perl actually meant >with 'illegal UTF-8'. My findings were that it didn't complain about >3-byte surrogates, and allowed characters >U+10FFFF. But otherwise, >I have found that using Encode, I can reduce the code quite a bit. > >Unfortunately, I got stuck with some very weird phenomenon: >Many Japanese pages (shift_jis and other) work very well with my >new code, but the Google JP page just won't transcode, see >http://qa-dev.w3.org/wmvs/duerst/check?uri=http%3A%2F%2Fwww.google.co.jp&ch >a rset=%28detect+automatically%29&doctype=%28detect+automatically%29&ss=1. > >I have verified that the transcoding code is actually used, that >the resulting lines have the UTF8 flag set, and also that the >pattern of readable characters and garbage that you can see >is still shift_jis. Any advice on what to test next is highly >appreciated! That problem is now mostly solved, it was because my code didn't convert the last line of a file, and google.co.jp had everything interesting in a single long last line. Regards, Martin.s
Received on Friday, 3 September 2004 01:41:36 UTC