- From: Jason Chang <jasonchang244@kimo.com.tw>
- Date: Wed, 27 Jun 2001 14:46:48 +0800
- To: <html-tidy@w3.org>
- Message-ID: <00bc01c0fed4$f1b0c6b0$9204a8c0@netempower.com>
Dear all, Thanks a lot for the help from all of you. The solution which Adrian provided works. If you have a similar problem, try to set character encoding as RAW as follows: Tidy tidy = new Tidy(); tidy.setCharEncoding(Configuration.RAW);//This avoids JTidy mapping values > 127 to entities This solves the problem of mapping bytes whose value is larger than 127 to entities. Thanks a lot for Adrian's help. Jason ----- Original Message ----- Have you tried using the -raw option for character sets? (I've never used JTidy, but I assume it has this). It may help, but it's probably safer to transcode your document into UTF8 / UTF16 before tidying it, and then transcode it back into your character set. Adrian -----Original Message----- From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of Jason Chang Sent: 26 June 2001 02:33 To: html-tidy@w3.org Subject: Chinese characters are converted into entities.. I am using JTidy to convert HTML with content of traditional Chinese characters (double bytes) to well-formed XML. It seems that every single Chinese character is converted to two XML entities like follows: Input (Traditional Chinese): ¤¤¤å´ú¸Õ Output (XML entities): ¤¤¤å´ú¸Õ Is this an encoding problem? or is there any property of JTidy I can config to prevent JTidy converting double bytes characters to XML entities? Many thanks, Jason Chang
Received on Wednesday, 27 June 2001 02:52:19 UTC