On Wed, 18 Apr 2001, Terje Bless wrote: > On 18.04.01 at 06:57, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > > >Just using the (far superior, if you ask me ;-) Unicode::*-modules would > >be the best option here. I don't know anything about Iconv, but why do you say that the Unicode::* modules are far superior? > It's an option, and what Liam did IIRC, but it requires a bit of work to > get right. Iconv is UNIX specific, but it's a standard. The Unicode modules > are Perl specific. Investigating Unicode::* (and how Liam did it ;D) is on > my TODO. I used Unicode::String and Unicode::Map8 to map lots of different encodings into ones supported by SP and into UTF-8 for output when validating multiple documents. Unicode::Map8 makes adding support for new encodings fairly easy (although it requires root access unless Unicode::Map8 was installed in a user's directory), so I didn't have trouble adding support for some Thai and Vietnamese encodings that weren't included in Unicode::Map8, and I was able to add support for the euro to the windows-* encodings. But Unicode::Map8 is limited to single-byte encodings. For multi-byte encodings, I used Ken Lunde's CJKVConv.pl and jconv.c, which were awkward to integrate into a Perl script. I presume that Iconv handles both single-byte and multi-byte encodings, in which case it's probably a better solution than Unicode::Map8. But I'm not sure if Iconv is as easily extensible. There's also a Unicode::Map that was developed separately from Unicode::Map8. I chose Unicode::Map8 since it seemed to cover more encodings at the time, although I don't think that Unicode::Map has limited itself to single-byte encodings. -- Liam QuinnReceived on Wednesday, 18 April 2001 12:47:17 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:57:01 GMT