W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2005

Re: Converting Latin-1 special characters to entities

From: John Campbell <jdc.rpv@cox.net>
Date: Sat, 11 Jun 2005 23:34:02 -0700
Message-ID: <42ABD75A.1080300@cox.net>
To: _and <andreafiore@r-w-x.org>
CC: html-tidy@w3.org

_and wrote:

>
> Hello,
> I'm cleaning some html pages that I have written by hand...
> This page are are written in Italian so there are a lot of accented 
> letters like(,,.. ecc..). I would like to convert this letters in 
> html entities...
>
> my problem is that i am not able to do this with tidy.... does it 
> support this function?

Change your output encoding to ASCII.  ASCII doesn't have accented 
characters, so tidy will translate into entities.

Either use "tidy -ascii" or modify your tidy config file to include the 
like "output-encoding: ascii"

I still haven't figured out tidy's character encoding translation 
mechanism though.  There are several encoding and language options for 
the config file that seem to overlap and interfere with each other.

I've got a bunch of html pages  in "win-1252," "iso-8859-1," "mac," and 
"utf-8" formats.  I've defined "output-encoding: latin1" (iso-8859-1) in 
my tidy.conf file and use the "-utf8" "-win1252" and such command line 
flags to define input encoding.

That's not how it's supposed to be done.  I SHOULD be putting 
"input-encoding: XXXXX" into the config, but that would mean 4 separate 
config files.  The problem is that not all the configuration elements 
are available from the command line...or maybe they are, and I just 
haven't found them.
Received on Monday, 13 June 2005 04:38:34 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:53 UTC