- From: Terje Bless <link@pobox.com>
- Date: Fri, 27 Jul 2001 10:54:14 +0200
- To: W3C Validator <www-validator@w3.org>
On 27.07.01 at 00:05, Bjoern Hoehrmann <derhoermi@gmx.net> wrote: >* Terje Bless wrote: >>>>When it comes time to parse the markup, you already have a charset; the >>>>XML/HTML rules do not govern HTTP. >>>They do for conforming applications. >> >>No they don't! > >A conforming HTML user agent must adhere to all "must"s in the HTML 4 >recommendation. Assuming no default value for the charset parameter is a >must. Applications that do something different, i.e. assuming some default >value or don't check if an explicit charset was given, aren't conforming >user agents. You fail to distinguish between a "HTNL 4 User Agent" and a "HTTP Client Application". By the time the HTTP Application has finished processing the response, and hands it over to the HTML 4 User Agent, the character encoding is "ISO-8859-1" with no way of knowing whether that is by implicit assumption (HTTP default parameter) or by explicit definition. The HTML 4 User Agent does not and cannot know whether the charset parameter was present or not. Just because most browsers today are hybrid HTTP/HTML combinations does not mean the distinction does not exist. BTW, I don't suppose there is any chance we could get an errata on the HTML Rec. that sez "meta" MAY be interpreted by the server but MUST be ignored by UAs? Pretty please with sugar on top? :-) Then we could drop this whole issue and just point at the errata. :-) >Please note that I don't comment on how applications should behave, nor if >I like this definition. Me neither. I'm not arguing what I think is the proper behaviur; I'm arguing about what is the correct interpretation of the relevant specs. If I were writing a browser-equivalent application, I would probably assume UTF-8 if no charset was given in the HTTP response, and complain -- loudly! -- if the result was not valid UTF-8 (or valid HTML for that matter ;D). If I were writing a spec I would probably mandate UTF-8 for unlabeled docs, and strongly discourage the use of other encodings. Unfortunately, for the Validator, correctness is the goal rather then convenience. I just simply don't know what the correct behaviour is here; and the fact that we need to take deployed browser behaviour and user expectations into account, for how we respond to whatever we decide the correct behaviour is, does not make the issue any clearer to me. At least the XML Rec. seems to have solved some of my problems for XML; it describes fairly well the expected behaviour when faced with various encoding variants and labellings. It's not quite unambigious, but it's close enough to split hairs on. :-) Björn, Nick, Martin (and anyone else with an opinion ;D)[0]: could you take a look at the pseudo-algorithm I posted the other day and tell me of any problems you see with it? What _exactly_ would you say is the "correct" behaviour for the Validator? Did I leave out anything? [0] - BTW, where is Liam at these day? I haven't seen him around in a while?
Received on Friday, 27 July 2001 05:02:00 UTC