Re: Encoding in the HTML/HTTP header from Jon Hanna on 2005-09-07 (www-international@w3.org from July to September 2005)

From: Jon Hanna <jon@hackcraft.net>
Date: Wed, 07 Sep 2005 11:05:48 +0100
To: www-international@w3.org
Message-ID: <431EBB7C.6000508@hackcraft.net>

Martin Duerst wrote:
> 
> Two possibilities I can immagine:
> 
> - The document is XML-based, the browser recognizes this, and
>   the uses the UTF-8 default for XML documents.
> - The browser analyses the byte sequences in the document and
>   heuristically detects that the document looks like UTF-8.
>   The chances for detecting UTF-8 correctly go up very quickly
>   even with only very few non-ASCII characters.

And goes up massively if the stream begins with a BOM (though using a 
BOM with UTF-8 has other issues).

Received on Wednesday, 7 September 2005 10:02:52 UTC