W3C home > Mailing lists > Public > www-validator@w3.org > April 2001

Re: Iconv vs. Unicode::* (was Re: several fixes)

From: Terje Bless <link@tss.no>
Date: Thu, 19 Apr 2001 01:15:45 +0200
To: Liam Quinn <liam@htmlhelp.com>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, www-validator@w3.org
Message-ID: <20010419033133-b01010701-a3b58ab0@>
On 18.04.01 at 12:47, Liam Quinn <liam@htmlhelp.com> wrote:

>On Wed, 18 Apr 2001, Terje Bless wrote:
>> On 18.04.01 at 06:57, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
>> >Just using the (far superior, if you ask me ;-) Unicode::*-modules
>> >would be the best option here.
>I don't know anything about Iconv, but why do you say that the Unicode::*
>modules are far superior?

Probably because they are part of Perl and so mostly guaranteed to
available anywhere that Perl is. Text::Iconv is just a wrapper around the
glibc iconv(3) functions and so you add another dependancy.

Ironically enough, while iconv(3) is part of the UNIX98 specification, it's
inclusion in glibc makes it available everywhere you have glibc. In theory,
this includes GNU/Linux distributions, *BSD, all the commercial Unices,
Windows(!), and, get this, Mac OS X! :-)

By some strange set of coincidences, Text::Iconv has become the most
portable solution. :-)

Of course, that's just in theory. In practice glibc isn't actually used on
many systems (commercial Unix, *BSD) and porting to Windoze and Mac OS X is
a pain, but the potential is there.

>I presume that Iconv handles both single-byte and multi-byte encodings

IIRC, yes.

>But I'm not sure if Iconv is as easily extensible.

I think in theory you just add a transliteration table, but in practice you
need to hack glibc for all but the most minor stuff. I haven't really
looked into this yet.

>There's also a Unicode::Map that was developed separately from
>Unicode::Map8.  I chose Unicode::Map8 since it seemed to cover more
>encodings at the time, although I don't think that Unicode::Map has
>limited itself to single-byte encodings.

I saw Unicode::Map pass by on Use Perl;, but I haven't had time to look at
it yet. I have more pressing issues to deal with before revisiting the
charset stuff.
Received on Wednesday, 18 April 2001 21:31:34 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 14:17:29 UTC