Re: Fallback to UTF-8 from Jukka K. Korpela on 2008-04-25 (www-validator@w3.org from April 2008)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 25 Apr 2008 13:19:05 +0300
To: "W3C Validator Community" <www-validator@w3.org>
Message-ID: <024501c8a6c0$e111f040$0500000a@DOCENDO>

Henri Sivonen wrote:

> My point is that while HTML 4.01
> doesn't specify this properly, this is a solved problem (by HTML 5)

You're joking, right? "HTML 5" is a collection of incomplete sketches.

HTML 4.01 rather properly specifies how the encoding shall be specified. 
Data that does not do that is outside the scope of the specification.

If you wish to add _pragmatic_ notes to that, then you should say that 
in the absence of encoding information, browsers usually imply _some_ 
encoding.

Did you notice that the press news that tells that there are now more 
Internet users in China than in the US? Would it make sense for a 
browser used in China to assume windows-1252?

> How about "The character encoding of the document was not explicit
> (assumed windows-1252) but the document contains non-ASCII."

Everything from the "(" onwards is gibberish to most authors and also 
fairly misleading. There's no "ASCII" or "non-ASCII" when the encoding 
has not been specified.

Jukka K. Korpela ("Yucca")
http://www.cs.tut.fi/~jkorpela/

Received on Friday, 25 April 2008 10:42:01 UTC