Re: Can 2-byte language such as Chinese be handled correctly?

JTidy do a straight translation of the C version of Tidy which including
its own routine to handle character encoding. Currently, Big5 isn't
supported by the JTidy. A solution to convert Big5 to UTF8 is developed
in XMLC (xmlc.enhydra.org) which use Tidy as the default parser.

We are using XMLC to parse Chinese HTML document into DOM.

David Li
DigitalSesame

TINGNIT wrote:
> 
> I recently tried Tidy integrated in HTML-Kit,
> but some Chinese characters were interpreted as
> two 1-byte symbols. Was this a problem of Tidy
> or of HTML-Kit?
> 
> Thanks.

Received on Monday, 29 January 2001 00:06:33 UTC