- From: Russell Gold <russgold@acm.org>
- Date: Mon, 14 May 2001 22:29:57 -0400
- To: ???? <bubblesort@pchome.com.tw>, html-tidy@w3.org
At 11:05 PM -0400 5/13/01, ???? wrote: >?Hello: > >How to use JTidy parsing non-ISO8859-1 charset HTML document just like >MS950 (Chinese Traditional) ? You may not be able to do it directly, *however* you can do it indirectly. Convert the raw document into text using the appropriate charset encoding - and then convert it to UTF and pass the result to JTidy, telling it that you are using UTF. See <http://www.httpunit.org> source code (especially ReceivedPage.java and HttpWebResponse.java) for an example of this. -- ------------------------------------------------------------------------ Russell Gold | "... society is tradition and order russgold@acm.org | and reverence, not a series of cheap | bargains between selfish interests." http://www.httpunit.org | - Poul Anderson, "Iron"
Received on Monday, 14 May 2001 22:35:32 UTC