Re: document charset and Cache-Control: no-cache

On Wednesday 15 February 2006 14:20, Michael Teichgräber wrote:
> Hi,
>
> when loading some html files, just converted to utf-8, into Amaya I
> realized that non-ascii characters were displayed broken (i.e. as if
> watched in iso-8859-1 mode), and the Document Information window said
> "Charset: Unknown".
>
> The pages were sent to Amaya by a CGI program with a Content-Type of
> "text/html; charset=utf-8", so I wondered what could be wrong.
>
> It seems to be caused by the existence of another header
> "Cache-Control: no-cache", that is sent by the CGI program too. If
> that header is present, the charset information seems to be ignored,
> even if Amaya's cache is disabled. If, however, I leave this header
> out, Amaya recognizes the character coding as utf-8, and the display
> is fine.
>
> I suppose the charset info shouldn't be discarded in that case.
> Perhaps it is not available anymore after the `no-cache' information
> has been processed? Both Amaya 8.5 and 9.4 show this behaviour.
>
> Regards,
>
> Michael

Amaya implements the standard policy:
1. Charset given in HTTP header
2. If not defined Charset given in xml declaration
3. If not defined Charset given in meta
4. If not defined use default Charset ( iso-8859-1  for HTML documents, utf-8 
for XHTML documents). If the document is declared XHTML and is served with 
text/html, the result could depend on the Web client.

We know that some browsers try to guess the real document encoding.
Amaya doesn't do that, as its goal is to help people to generate correct 
pages.

-- 
     Irène.
-----
Irène Vatton                     INRIA Rhône-Alpes
INRIA                               ZIRST
e-mail: Irene.Vatton@inria.fr       655 avenue de l'Europe
Tel.: +33 4 76 61 53 61             Montbonnot
Fax:  +33 4 76 61 52 07             38334 Saint Ismier Cedex - France

Received on Friday, 17 February 2006 16:27:26 UTC