Re: charset=us-ascii mandatory?

On Mon, 7 May 2007, olivier Thereaux wrote:

>> Isn't "us-ascii" the default value for "charset"?
>
> utf-8 is.

Only for XML-based documents. For classic HTML, there is controversy 
(conflict of specifications). The HTML 4.01 specification says that no 
default shall be assumed (which is a somewhat odd position, but not very 
odd if you think about it).

I think that for nominally SGML-based validation, a warning should be 
issued if the encoding not specified either in HTTP headers or in a meta 
tag, and validation should be carried out assuming the windows-1252 
encoding, since this covers the most common cases. You might in that case 
issue a warning about any octet in the 80..9F range, or perhaps even about 
any octet not in the ASCII range. The practical reason is that the 
rendering of the page _will_ vary by browser settings, since browsers will 
often use the encoding that was _last_ selected, and this might be just 
about anything.

For XML-based validation, the default is the XML default of utf-8 or 
utf-16 depending on the presence of a byte order mark at the start.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 7 May 2007 18:21:32 UTC