- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Thu, 24 Apr 2008 22:08:25 +0300
- To: W3C Validator Community <www-validator@w3.org>
On Apr 24, 2008, at 20:09 , Jukka K. Korpela wrote: > David Dorward wrote: > >> Looking at the HTML spec, it says 'user agents must not assume any >> default value for the "charset" parameter' >> (http://www.w3.org/TR/html4/charset.html >> ). So, following that guidance, the validator shouldn't guess at all >> and should just state that no encoding was found and that it can't >> continue until one is specified. > > I don't think that's quite the idea. Rather, that no default for the > parameter (US-ASCII, ISO-8859-1, UTF-8, or any other default) should > be > assumed. [...] > In the absence of any particular reason to guess anything else, I > think > a user agent should assume a hypothetical generic encoding (we could > give it a name, but that's not important right now) that uses 8 bits > for > one character so that octets 0 - 127 have their ASCII values and other > octets denote undefined graphic characters. Considering the real Web content, it is better to pick Windows-1252 than a hypothetical generic encoding. For what it's worth, HTML5 makes it conforming to have ASCII-only pages without declaring the character encoding, but having non-ASCII characters without an encoding declaration either of the HTTP level or on the HTML level makes a document invalid. Validator.nu emits a warning even when the content is ASCII-only if the encoding is not declared in order to flag content management systems that lack encoding declarations even if the particular page being validated happens to be ASCII-only. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Thursday, 24 April 2008 19:09:04 UTC