- From: John Campbell <jdc.rpv@cox.net>
- Date: Fri, 07 Jul 2006 17:21:03 -0700
- To: Luana Knoff <formiga_lua@yahoo.com.br>
- CC: html tidy <html-tidy@w3.org>
Luana Knoff wrote: > Hi all, > > I have some doubts about using HTML Tidy. I need to convert the char-encoding to utf-8 but > > when I do it some strangers characters appear instead of "�" and "accentuations", like in > > this case: The word: "Servi�os" appears as "Servi?os". But if I do the transformation first > > to ascii and after it to utf-8 the strangers characters don't appear. > > The comand lines are: > first I do: > > tidy --char-encoding ascii --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes > > --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes > > --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty > > --write-back yes --quiet yes -m teste_imagem.htm > > > And then I have to do again: > > tidy --char-encoding utf8 --tidy-mark no --wrap 99 --output-xml yes --output-xhtml yes > > --output-html yes --doctype omit --numeric-entities yes --quote-marks yes --quote-nbsp yes > > --quote-ampersand yes --logical-emphasis yes --enclose-text yes --alt-text empty > > --write-back yes --quiet yes -m teste_imagem.htm > > Anyone knows what I have to do to convert the page to utf-8 without have to do the > > transformation twice? > > Any hints are welcome. Try using "--input-encoding ????" and "--output-encoding utf-8" rather than --char-encoding. I'm guessing tidy is using the wrong encoding for input.
Received on Saturday, 8 July 2006 00:21:15 UTC