RE: How to use JTidy parsing non-ISO8859-1 charset HTML document ? from Valeri.Atamaniouk@nokia.com on 2001-05-18 (html-tidy@w3.org from April to June 2001)

From: <Valeri.Atamaniouk@nokia.com>
Date: Fri, 18 May 2001 20:41:42 +0300
To: russgold@acm.org, html-tidy@w3.org
Message-ID: <DFC7E257BE53D4118A5400508B691A0001A8FF90@eseis14nok>

> -----Original Message-----
> From: ext Russell Gold [mailto:russgold@acm.org]
> Sent: 15 May 2001 05:30
> To: ????; html-tidy@w3.org
> Subject: Re: How to use JTidy parsing non-ISO8859-1 charset HTML
> document ?
> 
> 
> At 11:05 PM -0400 5/13/01, ???? wrote:
> >?Hello:
> >
> >How to use JTidy parsing non-ISO8859-1 charset HTML document 
> just like
> >MS950 (Chinese Traditional) ?
> 
> You may not be able to do it directly, *however* you can do 
> it indirectly. Convert the raw document into text using the 
> appropriate charset encoding - and then convert it to UTF and 
> pass the result to JTidy, telling it that you are using UTF.
> 
I've sent him a version wich supports CN-Big5. Actually the offered approach
do not work fine with the mentioned encoding as some BIG5->Unicode->BIG5
translations give result different from original document.

BR
VA

Received on Friday, 18 May 2001 13:42:01 UTC