Re: charset parameter

* Terje Bless wrote:
>>A conforming HTML user agent must adhere to all "must"s in the HTML 4
>>recommendation. Assuming no default value for the charset parameter is a
>>must. Applications that do something different, i.e. assuming some default
>>value or don't check if an explicit charset was given, aren't conforming
>>user agents.
>You fail to distinguish between a "HTNL 4 User Agent" and a "HTTP Client

Not I fail, HTML 4 fails and it fails for a good reason with a bad
solution. If I had written that section, I think I'd recommended that,
if the document contains nothing but valid UTF-8 sequences, treat it as
UTF-8; otherwise treat it as ISO-8859-1 as HTTP/1.1 demands. Assuming
nothing and parse the document doesn't work, it's just nonsense.

>At least the XML Rec. seems to have solved some of my problems for XML; it
>describes fairly well the expected behaviour when faced with various
>encoding variants and labellings.

Not so. E.g. XML 1.0 Second Edition reads in section 4.3.3.

  "[...] It is also a fatal error if an XML entity contains no encoding
   declaration and its content is not legal UTF-8 or UTF-16."

That doesn't take higher level protocol information into account while
it should. John Cowan said on xml-dev regarding this issue "I personally
expect that the Core WG will act on it soon in the direction suggested".
Not that I like that...

>Björn, Nick, Martin (and anyone else with an opinion ;D)[0]: could you take
>a look at the pseudo-algorithm I posted the other day and tell me of any
>problems you see with it? What _exactly_ would you say is the "correct"
>behaviour for the Validator? Did I leave out anything?

I'll have a look, it's just another unread article ;-)

Btw., ISO/IEC 8859-16:2001 has just been published; we should add
support to the validator.
Björn Höhrmann { }
am Badedeich 7 } Telefon: +49(0)4667/981028 {
25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 }

Received on Friday, 27 July 2001 18:10:53 UTC