Re: Problems with Char-encoding - HTML TIDY

Luana Knoff wrote:
> Hi all,
> 
> I have some doubts about using HTML Tidy. I need to convert the char-encoding to utf-8 but 
> 
> when I do it some strangers characters appear instead of "�" and "accentuations", like in 
> 
> this case: The word: "Servi�os" appears as "Servi?os". But if I do the transformation first 
> 
> to ascii and after it to utf-8 the strangers characters don't appear. 
> 
> The comand lines are:
> first I do:
> 
> tidy  --char-encoding ascii --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes 
> 
> --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes 
> 
> --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty 
> 
> --write-back yes --quiet yes -m teste_imagem.htm
> 
> 
> And then I have to do again:
> 
> tidy  --char-encoding utf8 --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes 
> 
> --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes 
> 
> --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty 
> 
> --write-back yes --quiet yes -m teste_imagem.htm
> 
> Anyone knows what I have to do to convert the page to utf-8 without have to do the 
> 
> transformation twice? 
> 
> Any hints are welcome.

Try using "--input-encoding ????" and "--output-encoding utf-8" rather 
than --char-encoding.  I'm guessing tidy is using the wrong encoding for 
input.

Received on Saturday, 8 July 2006 00:21:15 UTC