W3C home > Mailing lists > Public > www-validator@w3.org > May 2007

Re: charset=us-ascii mandatory?

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Mon, 7 May 2007 21:21:23 +0300 (EEST)
To: www-validator@w3.org
Message-ID: <Pine.GSO.4.64.0705072109070.3471@mustatilhi.cs.tut.fi>

On Mon, 7 May 2007, olivier Thereaux wrote:

>> Isn't "us-ascii" the default value for "charset"?
>
> utf-8 is.

Only for XML-based documents. For classic HTML, there is controversy 
(conflict of specifications). The HTML 4.01 specification says that no 
default shall be assumed (which is a somewhat odd position, but not very 
odd if you think about it).

I think that for nominally SGML-based validation, a warning should be 
issued if the encoding not specified either in HTTP headers or in a meta 
tag, and validation should be carried out assuming the windows-1252 
encoding, since this covers the most common cases. You might in that case 
issue a warning about any octet in the 80..9F range, or perhaps even about 
any octet not in the ASCII range. The practical reason is that the 
rendering of the page _will_ vary by browser settings, since browsers will 
often use the encoding that was _last_ selected, and this might be just 
about anything.

For XML-based validation, the default is the XML default of utf-8 or 
utf-16 depending on the presence of a byte order mark at the start.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 7 May 2007 18:21:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:24 GMT