W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 2006

Re: Problems with Char-encoding - HTML TIDY

From: John Campbell <jdc.rpv@cox.net>
Date: Fri, 07 Jul 2006 17:21:03 -0700
Message-ID: <44AEFA6F.8030809@cox.net>
To: Luana Knoff <formiga_lua@yahoo.com.br>
CC: html tidy <html-tidy@w3.org>

Luana Knoff wrote:
> Hi all,
> 
> I have some doubts about using HTML Tidy. I need to convert the char-encoding to utf-8 but 
> 
> when I do it some strangers characters appear instead of "�" and "accentuations", like in 
> 
> this case: The word: "Servi�os" appears as "Servi?os". But if I do the transformation first 
> 
> to ascii and after it to utf-8 the strangers characters don't appear. 
> 
> The comand lines are:
> first I do:
> 
> tidy  --char-encoding ascii --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes 
> 
> --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes 
> 
> --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty 
> 
> --write-back yes --quiet yes -m teste_imagem.htm
> 
> 
> And then I have to do again:
> 
> tidy  --char-encoding utf8 --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes 
> 
> --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes 
> 
> --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty 
> 
> --write-back yes --quiet yes -m teste_imagem.htm
> 
> Anyone knows what I have to do to convert the page to utf-8 without have to do the 
> 
> transformation twice? 
> 
> Any hints are welcome.

Try using "--input-encoding ????" and "--output-encoding utf-8" rather 
than --char-encoding.  I'm guessing tidy is using the wrong encoding for 
input.
Received on Saturday, 8 July 2006 00:21:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT