Default charsets for text media types [i20] from Mark Nottingham on 2008-03-25 (ietf-http-wg@w3.org from January to March 2008)

From: Mark Nottingham <mnot@mnot.net>
Date: Wed, 26 Mar 2008 00:15:38 +1100
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <5565932F-C73D-4183-A09B-46993DD63F88@mnot.net>

Trying to summarise, I think there are two separable issues here;

* 1. Should HTTPBIS continue to accommodate historical clients that  
assume that an unlabeled text/* type is iso-8859-1, rather than MIME's  
default ASCII?

Roy has argued that this is an important distinction that we should  
continue to make, as otherwise existing implementations will become  
non-conformant. Frank points out that no such implementations are in  
common use today, and that those implementations which did make this  
assumption have greater problems (e.g., lacking Host headers).

It would be good to hear if anyone else has an opinion, especially if  
they have experience with / information about such clients, or content  
which relies upon this default.

The conservative thing to do seems to be to keep the status quo. If we  
do that, rather than just close the issue as WONTFIX, we could modify  
the current text to clarify the defaulting (the original question was  
one of precedence between HTTP defaulting and that defined by the  
media type in question), and perhaps give a bit of the history.


* 2. Should HTTPBIS countenance sniffing for character set on text/*  
types?
    a. ...when the charset parameter is not present?
    b. ...when the charset parameter is iso-8859-1
    c. ...at other times?

A few people have noted a security issue in a widely-used browser that  
requires (b). However, I haven't seen a reference to a vulnerability  
report, etc. yet; is anyone aware of one?

Some people have spoken in favour of (a), but I note with interest  
this text in p3, 3.1.1;
> Some HTTP/1.0 software has interpreted a Content-Type header without  
> charset parameter incorrectly to mean "recipient should guess."  
> Senders wishing to defeat this behavior MAY include a charset  
> parameter even when the charset is ISO-8859-1 ([ISO-8859-1]) and  
> SHOULD do so when it is known that it will not confuse the recipient.

If we allow either (a) or (b), this will have to be re-worked.

Also, it's notable that allowing (a) may make (1) easier to resolve in  
favour of dropping the HTTP-specific default; i.e., the default would  
shift from iso-8859-1 to "sniff".

Does anyone think that these are so intertwined that (2) should not be  
a separate issue? If we can resolve it, (1) should follow.

Cheers,

--
Mark Nottingham     http://www.mnot.net/

Received on Tuesday, 25 March 2008 13:16:27 UTC