- From: Russell Gold <russgold@acm.org>
- Date: Mon, 14 May 2001 22:29:57 -0400
- To: ???? <bubblesort@pchome.com.tw>, html-tidy@w3.org
At 11:05 PM -0400 5/13/01, ???? wrote:
>?Hello:
>
>How to use JTidy parsing non-ISO8859-1 charset HTML document just like
>MS950 (Chinese Traditional) ?
You may not be able to do it directly, *however* you can do it indirectly. Convert the raw document into text using the appropriate charset encoding - and then convert it to UTF and pass the result to JTidy, telling it that you are using UTF.
See <http://www.httpunit.org> source code (especially ReceivedPage.java and HttpWebResponse.java) for an example of this.
--
------------------------------------------------------------------------
Russell Gold | "... society is tradition and order
russgold@acm.org | and reverence, not a series of cheap
| bargains between selfish interests."
http://www.httpunit.org | - Poul Anderson, "Iron"
Received on Monday, 14 May 2001 22:35:32 UTC