Re: How to use JTidy parsing non-ISO8859-1 charset HTML document ?

At 11:05 PM -0400 5/13/01, ???? wrote:
>?Hello:
>
>How to use JTidy parsing non-ISO8859-1 charset HTML document just like
>MS950 (Chinese Traditional) ?

You may not be able to do it directly, *however* you can do it indirectly. Convert the raw document into text using the appropriate charset encoding - and then convert it to UTF and pass the result to JTidy, telling it that you are using UTF.

See <http://www.httpunit.org> source code (especially ReceivedPage.java and HttpWebResponse.java) for an example of this.
-- 
------------------------------------------------------------------------
Russell Gold                     | "... society is tradition and order
russgold@acm.org                 | and reverence, not a series of cheap
                                 | bargains between selfish interests."
http://www.httpunit.org          |   - Poul Anderson, "Iron"

Received on Monday, 14 May 2001 22:35:32 UTC