Fw: Chinese characters are converted into entities..

Dear all,

Thanks a lot for the help from all of you.

The solution which Adrian provided works.
If you have a similar problem, try to set character encoding as RAW as follows:

Tidy tidy = new Tidy();
tidy.setCharEncoding(Configuration.RAW);//This avoids JTidy mapping values > 127 to entities

This solves the problem of mapping bytes whose value is larger than 127 to entities.

Thanks a lot for Adrian's help.

Jason

----- Original Message ----- 

Have you tried using the -raw option for character sets?  (I've never used JTidy, but I assume it has this).  It may help, but it's probably safer to transcode your document into UTF8 / UTF16 before tidying it, and then transcode it back into your character set.
 
Adrian
  -----Original Message-----
  From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of Jason Chang
  Sent: 26 June 2001 02:33
  To: html-tidy@w3.org
  Subject: Chinese characters are converted into entities..


  I am using JTidy to convert HTML with content of traditional Chinese characters (double bytes) to well-formed XML.
  It seems that every single Chinese  character is converted to two XML entities like follows:
   
  Input (Traditional Chinese):
  ¤¤¤å´ú¸Õ
   
  Output (XML entities):
  ¤¤¤å´ú¸Õ 

  Is this an encoding problem? or is there any property of JTidy I can config to prevent JTidy converting double bytes characters to XML entities?

  Many thanks,
  Jason Chang

Received on Wednesday, 27 June 2001 02:52:19 UTC