W3C home > Mailing lists > Public > www-amaya@w3.org > October to December 2007

Re: UTF-8 as document default ?!

From: Leif Halvard Silli <hyperlekken@lenk.no>
Date: Fri, 23 Nov 2007 12:56:19 +0100
Message-ID: <4746BFE3.4080609@lenk.no>
To: Irene.Vatton@inrialpes.fr
CC: "www-amaya@w3.org" <www-amaya@w3.org>

Irene Vatton 23.11.2007 11:25:     

> We force now ISO-8859-1 as the initial default charset. People who wants UTF-8 
> has just to change it either in Preferences or at the document creation.
>
> UTF-8 should be largely adopted, but many web servers uses the suffix to fix 
> the charset and serves ".html" documents with ISO-8859-1 charset. As the 
>   
I have experienced this myself, once. Fortunately, I am able to edit 
.htaccess files. But it probably can be a real problem.

> charset given by the server has a higher priority than the document charset, 
> it's important to generate documents with a compatible charset.
>   

A UTF-8 encoded page can be served as ISO-8859-1, provided you encode 
all non-ASCII letters as character entities. Of course there is no 
purpose in doing that, in itself.

But if, as you describe it, the purpose of using ISO-8859-1 is to be 
compatible with servers that serve .html as ISO-8859-1 by default, then 
you could, just as well announce 'UTF-8' inside the document, but still 
encode non-ASCII as character entities.

The users could then - instead of the trippel choice between ISO-8859-1, 
UTF-8 and US-ASCII - be provided with the on-off choice between either 
to encode or not encode non-ASCII letters. This should be a 
simplification, for users.

The advantage of this, for me as an editor of Amaya produced files, is 
that it becomes maximum simple to switch between Amaya and my 
(typically) UTF-8 default text editor. (When I receive a US-ASCII 
encoded page, and turn all entities into UTF-8 encoded letters, then I 
must remember to also change the META and/or <?xml > element ...

I don't know - would there be any disadvantages to do it that way?

NVU is very good that way. It lets users select between whether to 
character entity encode non-ASCII characters or not - regardless of the 
document encoding.
-- 
leif halvard silli
Received on Friday, 23 November 2007 11:56:32 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 April 2014 11:01:47 UTC