W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

Re: How to use JTidy parsing non-ISO8859-1 charset HTML document ?

From: Russell Gold <russgold@acm.org>
Date: Mon, 14 May 2001 22:29:57 -0400
Message-Id: <a05010402b726438341b3@[64.194.231.17]>
To: ???? <bubblesort@pchome.com.tw>, html-tidy@w3.org
At 11:05 PM -0400 5/13/01, ???? wrote:
>?Hello:
>
>How to use JTidy parsing non-ISO8859-1 charset HTML document just like
>MS950 (Chinese Traditional) ?

You may not be able to do it directly, *however* you can do it indirectly. Convert the raw document into text using the appropriate charset encoding - and then convert it to UTF and pass the result to JTidy, telling it that you are using UTF.

See <http://www.httpunit.org> source code (especially ReceivedPage.java and HttpWebResponse.java) for an example of this.
-- 
------------------------------------------------------------------------
Russell Gold                     | "... society is tradition and order
russgold@acm.org                 | and reverence, not a series of cheap
                                 | bargains between selfish interests."
http://www.httpunit.org          |   - Poul Anderson, "Iron"
Received on Monday, 14 May 2001 22:35:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT