W3C home > Mailing lists > Public > www-validator@w3.org > July 2001

Re: charset parameter

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sat, 28 Jul 2001 00:09:44 +0200
To: Terje Bless <link@pobox.com>
Cc: W3C Validator <www-validator@w3.org>
Message-ID: <0qn3mtsd1kk6mu2i8oc4j19u8cticdfj2i@4ax.com>
* Terje Bless wrote:
>>A conforming HTML user agent must adhere to all "must"s in the HTML 4
>>recommendation. Assuming no default value for the charset parameter is a
>>must. Applications that do something different, i.e. assuming some default
>>value or don't check if an explicit charset was given, aren't conforming
>>user agents.
>You fail to distinguish between a "HTNL 4 User Agent" and a "HTTP Client

Not I fail, HTML 4 fails and it fails for a good reason with a bad
solution. If I had written that section, I think I'd recommended that,
if the document contains nothing but valid UTF-8 sequences, treat it as
UTF-8; otherwise treat it as ISO-8859-1 as HTTP/1.1 demands. Assuming
nothing and parse the document doesn't work, it's just nonsense.

>At least the XML Rec. seems to have solved some of my problems for XML; it
>describes fairly well the expected behaviour when faced with various
>encoding variants and labellings.

Not so. E.g. XML 1.0 Second Edition reads in section 4.3.3.

  "[...] It is also a fatal error if an XML entity contains no encoding
   declaration and its content is not legal UTF-8 or UTF-16."

That doesn't take higher level protocol information into account while
it should. John Cowan said on xml-dev regarding this issue "I personally
expect that the Core WG will act on it soon in the direction suggested".
Not that I like that...

>Björn, Nick, Martin (and anyone else with an opinion ;D)[0]: could you take
>a look at the pseudo-algorithm I posted the other day and tell me of any
>problems you see with it? What _exactly_ would you say is the "correct"
>behaviour for the Validator? Did I leave out anything?

I'll have a look, it's just another unread article ;-)

Btw., ISO/IEC 8859-16:2001 has just been published; we should add
support to the validator.
Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/
Received on Friday, 27 July 2001 18:10:53 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:58:22 UTC