W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2001

Re: Can 2-byte language such as Chinese be handled correctly?

From: David Li <david@digitalsesame.com>
Date: Mon, 29 Jan 2001 13:06:46 +0800
Message-ID: <3A74FA66.DABBA4B8@digitalsesame.com>
To: TINGNIT <b88201046@ms88.ntu.edu.tw>
CC: html-tidy@w3.org
JTidy do a straight translation of the C version of Tidy which including
its own routine to handle character encoding. Currently, Big5 isn't
supported by the JTidy. A solution to convert Big5 to UTF8 is developed
in XMLC (xmlc.enhydra.org) which use Tidy as the default parser.

We are using XMLC to parse Chinese HTML document into DOM.

David Li
DigitalSesame

TINGNIT wrote:
> 
> I recently tried Tidy integrated in HTML-Kit,
> but some Chinese characters were interpreted as
> two 1-byte symbols. Was this a problem of Tidy
> or of HTML-Kit?
> 
> Thanks.
Received on Monday, 29 January 2001 00:06:33 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT