W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2002

Re: Does TIDY support Chinese character set?

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 05 Nov 2002 03:38:21 +0100
To: "YANG,JI-FENG (HP-China,ex2)" <ji-feng.yang@hp.com>
Cc: html-tidy@w3.org
Message-ID: <3de12de3.90637900@smtp.bjoern.hoehrmann.de>

* YANG,JI-FENG (HP-China,ex2) wrote:
>Does TIDY support Chinese character set such as GB2312 and BIG5, which can
>be two bytes, three bytes and four bytes?

No, it does not. Current sources can be build in a way Big5 characters
will be passed unchanged from input to output, but that's not the
default. Maybe Tidy will someday support these encodings, but if it must
recode them internally to Unicode. I fear it is unlikely this will
happen soon. If you want to use Tidy with unsupported encodings, just
recode them to utf-8 before passing it to Tidy, on Linux you could use
`recode` or `iconv` for this task, e.g.

  % cat document.html | recode gb2312..utf8 | \
    tidy -utf8 | recode utf8..gb2312 > document.html

regards.
Received on Monday, 4 November 2002 21:38:08 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 23:39:48 UTC