W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

Fw: Chinese characters are converted into entities..

From: Jason Chang <jasonchang244@kimo.com.tw>
Date: Wed, 27 Jun 2001 14:46:48 +0800
Message-ID: <00bc01c0fed4$f1b0c6b0$9204a8c0@netempower.com>
To: <html-tidy@w3.org>
Dear all,

Thanks a lot for the help from all of you.

The solution which Adrian provided works.
If you have a similar problem, try to set character encoding as RAW as follows:

Tidy tidy = new Tidy();
tidy.setCharEncoding(Configuration.RAW);//This avoids JTidy mapping values > 127 to entities

This solves the problem of mapping bytes whose value is larger than 127 to entities.

Thanks a lot for Adrian's help.

Jason

----- Original Message ----- 

Have you tried using the -raw option for character sets?  (I've never used JTidy, but I assume it has this).  It may help, but it's probably safer to transcode your document into UTF8 / UTF16 before tidying it, and then transcode it back into your character set.
 
Adrian
  -----Original Message-----
  From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of Jason Chang
  Sent: 26 June 2001 02:33
  To: html-tidy@w3.org
  Subject: Chinese characters are converted into entities..


  I am using JTidy to convert HTML with content of traditional Chinese characters (double bytes) to well-formed XML.
  It seems that every single Chinese  character is converted to two XML entities like follows:
   
  Input (Traditional Chinese):
  
   
  Output (XML entities):
  &curren;&curren;&curren;&aring;&acute;&uacute;&cedil;&Otilde; 

  Is this an encoding problem? or is there any property of JTidy I can config to prevent JTidy converting double bytes characters to XML entities?

  Many thanks,
  Jason Chang
Received on Wednesday, 27 June 2001 02:52:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT