RE: Unknown text/* subtypes [i20]

Mark Nottingham wrote:
> Personally -- I agree; the only sane thing to do here seems to be to
> remove HTTP defaulting.
> 
> The simplest thing seems to be to remove this text;
> 
> > When no explicit charset parameter is provided by the sender, media
> > subtypes of the "text" type are defined to have a default charset
> > value of "ISO-8859-1" when received via HTTP.
> 
> BUT, note the following text:
> 
> > Data in character sets other than "ISO-8859-1" or its subsets MUST
> > be labeled with an appropriate charset value.
> 
> 
> Depending on how you read the context, this would need to be restated
> as something like:
> 
> "Media subtypes of the "text" type MUST be labeled with an appropriate
> charset value."
> 
> As I think I've said before, requiring this often leads to
> mislabelling, because (for example) a Web server administrator will
> set an unrealistic policy like "all of our content is UTF-8",
> configure headers to suit, forgetting some legacy content on the site
> that's in a different encoding.
> 
> My preference would be to soften this to a SHOULD, so that in cases
> where it's administratively difficult for people to set a charset
> value, conflicting statements aren't made. I'd rather have the
> metadata be explicitly missing than wrong.

Except that administrative difficulties lead to complaints which lead to
fixed software where nothing else seems to. As a server implementor, I'm
tired of guessing charsets on request bodies [1]. I'd much rather see a
MUST so the conversation fails in a way that clearly fixes blame, even
though I may choose to work around it.


Robert Brewer
CherryPy Team
fumanchu@aminus.org

[1] http://www.cherrypy.org/browser/trunk/cherrypy/lib/encoding.py#L25

Received on Wednesday, 9 January 2008 17:47:38 UTC