Re: Does TIDY support Chinese character set?

* YANG,JI-FENG (HP-China,ex2) wrote:
>Does TIDY support Chinese character set such as GB2312 and BIG5, which can
>be two bytes, three bytes and four bytes?

No, it does not. Current sources can be build in a way Big5 characters
will be passed unchanged from input to output, but that's not the
default. Maybe Tidy will someday support these encodings, but if it must
recode them internally to Unicode. I fear it is unlikely this will
happen soon. If you want to use Tidy with unsupported encodings, just
recode them to utf-8 before passing it to Tidy, on Linux you could use
`recode` or `iconv` for this task, e.g.

  % cat document.html | recode gb2312..utf8 | \
    tidy -utf8 | recode utf8..gb2312 > document.html

regards.

Received on Monday, 4 November 2002 21:38:08 UTC