text/* types and charset defaults

"If we couldn't fix it then, why do you imagine you can fix it now?"

I'm arguing for documenting current practice, making some recommendations
for safe behavior, and moving on.

We certainly *wanted* to change the default charset for HTTP when working on
2026 but couldn't find a way around the impasse between backward
compatability, client sniffing, server misconfiguration et al.

I think the main thing to do is to document the actual situation
sufficiently such that new HTTP implementations don't break things:

1) servers (senders): don't make up a charset if you don't know what it is
(this is a good rule for any kind of descriptive information, isn't it?). 
2) clients (receivers): servers (senders) are unfortunately often
misconfigured and will label things with the wrong charset. (This is often
because lots of software uses 'mime type' when what's wanted is usually
'content type' and the parameters get lost). But guessing blindly and
ignoring what the server sent seems like a bad idea, and even has security
implications. So "beware".
3) everybody: even if you agree about charset, accept other end-of-line
terminations (not just CRLF which MIME required.)

(It's really receiver & sender, not client & server, since the rules should
apply for file upload as well as anything else.)

Larry

Received on Sunday, 13 January 2008 16:53:38 UTC