W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2008

Re: gb2312 encoding

From: Eric Frost <eric.frost@mp2kmag.com>
Date: Mon, 4 Aug 2008 10:26:52 -0500
Message-ID: <56FABABC4947455D8686F1E4F98BB5F3@Ericdell>
To: "alex242" <pshenichnov@gmail.com>
Cc: <html-tidy@w3.org>

Alex,

Try this: http://www.iconv.com/iconv.htm

I never knew there were so many character sets!

I almost had ASCII memorized at one point programming Commodores...

Eric

______________________________________
Eric Frost, PhD
http://www.mappoint2009.com
http://www.pushpintool.com


--------------------------------------------------
From: "Arnaud Desitter" <arnaud02@users.sourceforge.net>
Sent: Monday, August 04, 2008 10:08 AM
To: "alex242" <pshenichnov@gmail.com>
Cc: <html-tidy@w3.org>
Subject: Re: gb2312 encoding

>
> Hi,
>
> You need to convert your files from whatever encoding they are in
> (gb2313 in your case) to UTF8. iconv is an option.
>
> Regards,
>
> 2008/8/4 alex242 <pshenichnov@gmail.com>:
>>
>> Hello,
>>
>> I have some problems with HTML documents in gb2313 (simple chinese)
>> encoding. After using Tidy I get some unreadable characters. Example of
>> document:  http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
>> http://www.chemspider.com/ArticlesHandler.ashx?type=art&id=69
>>
>> my Tidy config file:
>>
>> output-file: res.html
>> error-file: error.txt
>> char-encoding: utf8
>> output-bom: yes
>> output-encoding: utf8
>>
>> I've already spent several days trying to solve this problem and without 
>> any
>> success... so, if sombody can give some advise how to work with different
>> encodings in Tide, it will be much appreciated.
>>
>> best regards,
>> Alex
 
Received on Monday, 4 August 2008 15:27:31 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:59 GMT