Iconv vs. Unicode::* (was Re: several fixes) from Liam Quinn on 2001-04-18 (www-validator@w3.org from April 2001)

From: Liam Quinn <liam@htmlhelp.com>
Date: Wed, 18 Apr 2001 12:47:41 -0400 (EDT)
To: Terje Bless <link@tss.no>
cc: Bjoern Hoehrmann <derhoermi@gmx.net>, <www-validator@w3.org>
Message-ID: <Pine.LNX.4.30.0104181226110.932-100000@localhost.localdomain>

On Wed, 18 Apr 2001, Terje Bless wrote:

> On 18.04.01 at 06:57, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
>
> >Just using the (far superior, if you ask me ;-) Unicode::*-modules would
> >be the best option here.

I don't know anything about Iconv, but why do you say that the Unicode::*
modules are far superior?

> It's an option, and what Liam did IIRC, but it requires a bit of work to
> get right. Iconv is UNIX specific, but it's a standard. The Unicode modules
> are Perl specific. Investigating Unicode::* (and how Liam did it ;D) is on
> my TODO.

I used Unicode::String and Unicode::Map8 to map lots of different
encodings into ones supported by SP and into UTF-8 for output when
validating multiple documents.  Unicode::Map8 makes adding support for new
encodings fairly easy (although it requires root access unless
Unicode::Map8 was installed in a user's directory), so I didn't have
trouble adding support for some Thai and Vietnamese encodings that weren't
included in Unicode::Map8, and I was able to add support for the euro to
the windows-* encodings.

But Unicode::Map8 is limited to single-byte encodings.  For multi-byte
encodings, I used Ken Lunde's CJKVConv.pl and jconv.c, which were awkward
to integrate into a Perl script.  I presume that Iconv handles both
single-byte and multi-byte encodings, in which case it's probably a better
solution than Unicode::Map8.  But I'm not sure if Iconv is as easily
extensible.

There's also a Unicode::Map that was developed separately from
Unicode::Map8.  I chose Unicode::Map8 since it seemed to cover more
encodings at the time, although I don't think that Unicode::Map has
limited itself to single-byte encodings.

-- 
Liam Quinn

Received on Wednesday, 18 April 2001 12:47:17 UTC