- From: Terje Bless <link@pobox.com>
- Date: Sun, 8 Jun 2003 07:54:48 +0200
- To: W3C Validator <www-validator@w3.org>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 [ no longer CCing as you're subscribed IIRC ] Karl Ove Hufthammer <karl@huftis.org> wrote: >But in my opinion the main problem is that the validator is labeling >perfectly valid documents as invalid. I think this is more serious than >not labeling invalid documents as invalid because of character encoding >issues. I disagree. I think the worst thing we could do is label something as valid if there is a chance that it isn't. And as mentioned, the validator is saying that "I cannot pronounce this to be valid because you did not give me enough information to reliably test it.", not "This document isn't valid". I'll accept a charge that the way this is presented needs work though, if this distinction isn't allready clear? >>In particular, if we allow for your interpretation above, we would in >>effect default to ISO-8859-1 not only for pages such as Kjetil's (who >>are most certainly correct and the author very aware of what he is >>doing), but also for Joe Web-duh-signer and his clueless little hosting >>company where there is _no_ conscious decision involved and ISO-8859-1 >>is the _wrong_ value more often then not. > >'More often than not'? Isn't ISO-8859-1 the most used encoding for valid >documents? ISO-8859-1 is presumably the most widely used encoding in Europe and North America, and I would assume for both valid and invalid documents (modulo the Windows-1252 and MacRoman documents, which I lump in with ISO-8859-1 for simplicity). But if you look at the __World_Wide__ Web I think it is highly unlikely that ISO-8859-1 is the correct encoding for the majority of pages in general, and growing less likely by the minute as Asia, Latin America, and Africa comes on-line. >And if there are '_no_ conscious decision involved', I doubt the >Web pages would be valid even with an explicit character encoding >declaration. Character Encoding is far more esoteric and obscure -- i.e. harder to make people aware of -- than markup validity. I see a lot of people that obviously has little technical understanding of the markup flavour they're using, that are trying for passing validation (because this is obvious) which I highly doubt has ever or will ever consider what encoding they are using unless the validator tells them there is a problem. BTW, I'm somewhat playing devil's advocate in this thread. If I were developing the validator as an in-house tool I would have implemented HTTP defaulting rules -- and bugger the HTML 4.0 Rec -- and just noted the lack of an explicit encoding, and possibly checked for signs of Win-1252/MacRoman. The above is more or less where we ended up after discussion and not where I originally started out. You may detect artifacts of this scattered around my arguments. :-) - -- "I don't want to learn to manage my anger; I want to FRANCHISE it!" -- Kevin Martin -----BEGIN PGP SIGNATURE----- Version: PGP SDK 3.0.2 iQA/AwUBPuLPp6PyPrIkdfXsEQIb1wCfemfF5KRisMIQPnE8FGJxQLiMZ2cAoJBX TkFDtDmTSzedRFF1doWXvj1N =lefY -----END PGP SIGNATURE-----
Received on Sunday, 8 June 2003 01:54:52 UTC