Re: Encoding in the HTML/HTTP header from Martin Duerst on 2005-09-06 (www-international@w3.org from July to September 2005)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 07 Sep 2005 06:05:26 +0900
To: Michael Monaghan <Michael.Monaghan@Sun.COM>, www-international@w3.org
Message-Id: <6.0.0.20.2.20050907060200.04d60650@localhost>

Two possibilities I can immagine:

- The document is XML-based, the browser recognizes this, and
   the uses the UTF-8 default for XML documents.
- The browser analyses the byte sequences in the document and
   heuristically detects that the document looks like UTF-8.
   The chances for detecting UTF-8 correctly go up very quickly
   even with only very few non-ASCII characters.
What's your browser? What do other browsers do?

In any case, it's always much better to explicitly declare
the encoding, in the HTTP header or the document itself
or both.

Regards,   Martin.

At 05:25 05/09/07, Michael Monaghan wrote:

>Hi,
>
>I'm a little confused. I'm testing a form on a page on an internal server 
>to see that it handles non-western text properly.
>
>It seems to handle everything fine, - the View/Encoding option in my 
>browser tells me that it's in UTF-8, as it should be.
>
>However neither the HTML nor HTTP headers declare any encoding.
>
>I snooped the HTTP header by telnetting the server on port 80:
>
>telnet server.domain 80
>Trying 10.51.15.50...
>Connected to server.domain.sun.com.
>Escape character is '^]'.
>GET /file.html HTTP/1.0
>
>HTTP/1.1 200 OK
>Server: Netscape-Enterprise/4.1
>Date: Tue, 06 Sep 2005 20:03:52 GMT
>Content-type: text/html
>Content-length: 1222
>Connection: close
>
>.....etc....etc. I just don't understand how the browser knows to treat 
>the page as UTF-8, when I can find where it's declared. My browser default 
>encoding is set to iso-1. Thoughts please. thanks, -mm

Received on Tuesday, 6 September 2005 22:14:19 UTC