- From: Bjoern Hoehrmann <derhoermi@gmx.net>
- Date: Tue, 05 Nov 2002 03:38:21 +0100
- To: "YANG,JI-FENG (HP-China,ex2)" <ji-feng.yang@hp.com>
- Cc: html-tidy@w3.org
* YANG,JI-FENG (HP-China,ex2) wrote: >Does TIDY support Chinese character set such as GB2312 and BIG5, which can >be two bytes, three bytes and four bytes? No, it does not. Current sources can be build in a way Big5 characters will be passed unchanged from input to output, but that's not the default. Maybe Tidy will someday support these encodings, but if it must recode them internally to Unicode. I fear it is unlikely this will happen soon. If you want to use Tidy with unsupported encodings, just recode them to utf-8 before passing it to Tidy, on Linux you could use `recode` or `iconv` for this task, e.g. % cat document.html | recode gb2312..utf8 | \ tidy -utf8 | recode utf8..gb2312 > document.html regards.
Received on Monday, 4 November 2002 21:38:08 UTC