W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2007

Re: internal encoding

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 08 Mar 2007 02:58:59 +0100
To: gzahl@arcor.de
Cc: html-tidy@w3.org
Message-ID: <nbruu2h60ionv3d4arehtg8j979judog06@hive.bjoern.hoehrmann.de>

* gzahl@arcor.de wrote:
>Im using the ICU Library for i18n in my project and would like to use
>libtidy. What is the internal encoding of libtidy? Because i would not
>like to convert more often, than i have to. I didnt found anything about
>this in the mailing list or the documentation. It just seems it uses a
>codepage representation of the data to compare it.

Tidy uses UTF-8 internally except when you specify the big5 or shiftjis
options, in those cases the internal encoding is something weird. We
would like to use UTF-8 always, but that would require transcoding those
encodings to UTF-8 for which we have no code.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Thursday, 8 March 2007 01:58:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT