- From: John Campbell <jdc.rpv@cox.net>
- Date: Sat, 11 Jun 2005 23:34:02 -0700
- To: _and <andreafiore@r-w-x.org>
- CC: html-tidy@w3.org
_and wrote: > > Hello, > I'm cleaning some html pages that I have written by hand... > This page are are written in Italian so there are a lot of accented > letters like(è,ò,ù.. ecc..). I would like to convert this letters in > html entities... > > my problem is that i am not able to do this with tidy.... does it > support this function? Change your output encoding to ASCII. ASCII doesn't have accented characters, so tidy will translate into entities. Either use "tidy -ascii" or modify your tidy config file to include the like "output-encoding: ascii" I still haven't figured out tidy's character encoding translation mechanism though. There are several encoding and language options for the config file that seem to overlap and interfere with each other. I've got a bunch of html pages in "win-1252," "iso-8859-1," "mac," and "utf-8" formats. I've defined "output-encoding: latin1" (iso-8859-1) in my tidy.conf file and use the "-utf8" "-win1252" and such command line flags to define input encoding. That's not how it's supposed to be done. I SHOULD be putting "input-encoding: XXXXX" into the config, but that would mean 4 separate config files. The problem is that not all the configuration elements are available from the command line...or maybe they are, and I just haven't found them.
Received on Monday, 13 June 2005 04:38:34 UTC