- From: David Dorward <david@dorward.me.uk>
- Date: Thu, 24 Apr 2008 17:47:24 +0100
- To: www-validator@w3.org
On 24 Apr 2008, at 17:11, Andreas Prilop wrote: > On Thu, 24 Apr 2008, David Dorward wrote: > >> If it assumes ISO-8859-1 and the document is UTF-8, >> how is that any improvement? > - First it assumes "charset=utf-8". > - Then it immediately states that this is impossible. > > Where is the logic in this behaviour? I agree that it is not ideal, but assuming ISO-8859-1 for UTF-8 documents is no better than assuming UTF-8 for ISO-8859-1 documents. (Or either for Shift_JIS documents etc etc). Your argument appears to be "I use ISO-8859-1 therefore the validator should default to ISO-8859-1", which isn't, IMO, a very convincing one. Am I interpreting you incorrectly? Looking at the HTML spec, it says 'user agents must not assume any default value for the "charset" parameter' (http://www.w3.org/TR/html4/charset.html ). So, following that guidance, the validator shouldn't guess at all and should just state that no encoding was found and that it can't continue until one is specified. My preference would be to try to validate the document by assuming a number of different encodings in turn until one was successfully parsed, but this would be significantly more work when just changing the default. In that event, I might also be tempted to recommend making the warning about guessing even more prominent then it is at present (a fat red border perhaps?). -- David Dorward http://dorward.me.uk/ http://blog.dorward.me.uk/
Received on Thursday, 24 April 2008 16:48:10 UTC