- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Mon, 7 May 2007 21:21:23 +0300 (EEST)
- To: www-validator@w3.org
On Mon, 7 May 2007, olivier Thereaux wrote: >> Isn't "us-ascii" the default value for "charset"? > > utf-8 is. Only for XML-based documents. For classic HTML, there is controversy (conflict of specifications). The HTML 4.01 specification says that no default shall be assumed (which is a somewhat odd position, but not very odd if you think about it). I think that for nominally SGML-based validation, a warning should be issued if the encoding not specified either in HTTP headers or in a meta tag, and validation should be carried out assuming the windows-1252 encoding, since this covers the most common cases. You might in that case issue a warning about any octet in the 80..9F range, or perhaps even about any octet not in the ASCII range. The practical reason is that the rendering of the page _will_ vary by browser settings, since browsers will often use the encoding that was _last_ selected, and this might be just about anything. For XML-based validation, the default is the XML default of utf-8 or utf-16 depending on the presence of a byte order mark at the start. -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Monday, 7 May 2007 18:21:32 UTC