- From: Liam Quinn <liam@htmlhelp.com>
- Date: Wed, 18 Apr 2001 12:47:41 -0400 (EDT)
- To: Terje Bless <link@tss.no>
- cc: Bjoern Hoehrmann <derhoermi@gmx.net>, <www-validator@w3.org>
On Wed, 18 Apr 2001, Terje Bless wrote: > On 18.04.01 at 06:57, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: > > >Just using the (far superior, if you ask me ;-) Unicode::*-modules would > >be the best option here. I don't know anything about Iconv, but why do you say that the Unicode::* modules are far superior? > It's an option, and what Liam did IIRC, but it requires a bit of work to > get right. Iconv is UNIX specific, but it's a standard. The Unicode modules > are Perl specific. Investigating Unicode::* (and how Liam did it ;D) is on > my TODO. I used Unicode::String and Unicode::Map8 to map lots of different encodings into ones supported by SP and into UTF-8 for output when validating multiple documents. Unicode::Map8 makes adding support for new encodings fairly easy (although it requires root access unless Unicode::Map8 was installed in a user's directory), so I didn't have trouble adding support for some Thai and Vietnamese encodings that weren't included in Unicode::Map8, and I was able to add support for the euro to the windows-* encodings. But Unicode::Map8 is limited to single-byte encodings. For multi-byte encodings, I used Ken Lunde's CJKVConv.pl and jconv.c, which were awkward to integrate into a Perl script. I presume that Iconv handles both single-byte and multi-byte encodings, in which case it's probably a better solution than Unicode::Map8. But I'm not sure if Iconv is as easily extensible. There's also a Unicode::Map that was developed separately from Unicode::Map8. I chose Unicode::Map8 since it seemed to cover more encodings at the time, although I don't think that Unicode::Map has limited itself to single-byte encodings. -- Liam Quinn
Received on Wednesday, 18 April 2001 12:47:17 UTC