Re: no encoding specified when creating XHTML documents from Bertrand.Ibrahim@cui.unige.ch on 2000-08-23 (www-amaya@w3.org from July to September 2000)

From: <Bertrand.Ibrahim@cui.unige.ch>
Date: Wed, 23 Aug 2000 19:06:20 +0200
To: www-amaya@w3.org
Message-id: <0FZR00JLL8UKH6@cuimail.unige.ch>

Vincent.Quint@inrialpes.fr said:
> If your documents are written in French, I suggest you set a lang
> attribute on the <html> or <body> element...
> Then, when you save a document, the encoding is added to the
> XML declaration and a <meta http-equiv...> element is generated with
> the right charset.

Why do it that way? It doesn't seem right to tie the encoding with the
language specification. My understanding is that the encoding defines 
what numerical byte values are stored in a file to represent characters,
while the language specification makes reference to how sequence of 
characters are made up to form words. I might very well want to write a 
French text in plain ASCII, using XHTML's ISO Latin 1 entities for 
accented letters. In such a case, I just need to add 

  <!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1 for XHTML//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
    %HTMLlat1;

to my XML code and declare a UTF-8 encoding. I might also want to write 
an English text, talking about accented letters, in ISO-8859-1 
encoding/character set.

I can understand if, for the moment, Amaya is limited to ISO-8859-1
encoding. But then, the insertion of the encoding="ISO-8859-1" attribute
should be based on whether there is an accented letter in the current 
document or not. This is actually how my email program works. If I send 
an email message that contains accented letters (like the following 
e acute: é), my email program automatically adds a "charset=iso-8859-1" 
to the "Content-Type: text/plain;" header.

Peace,

Bertrand Ibrahim.
--------------------------------------------
Bertrand.Ibrahim@cui.unige.ch
http://cui.unige.ch/eao/www/Bertrand.html

Received on Wednesday, 23 August 2000 13:06:23 UTC