W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2008

Re: Unknown text/* subtypes [i20]

From: Mark Nottingham <mnot@mnot.net>
Date: Wed, 9 Jan 2008 14:16:42 +1100
Cc: ietf-http-wg@w3.org
Message-Id: <9713B04D-7394-4DFD-A08F-B3EBEAEAA50D@mnot.net>
To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>

Personally -- I agree; the only sane thing to do here seems to be to  
remove HTTP defaulting.

The simplest thing seems to be to remove this text;

> When no explicit charset parameter is provided by the sender, media  
> subtypes of the "text" type are defined to have a default charset  
> value of "ISO-8859-1" when received via HTTP.

BUT, note the following text:

> Data in character sets other than "ISO-8859-1" or its subsets MUST  
> be labeled with an appropriate charset value.

Depending on how you read the context, this would need to be restated  
as something like:

"Media subtypes of the "text" type MUST be labeled with an appropriate  
charset value."

As I think I've said before, requiring this often leads to  
mislabelling, because (for example) a Web server administrator will  
set an unrealistic policy like "all of our content is UTF-8",  
configure headers to suit, forgetting some legacy content on the site  
that's in a different encoding.

My preference would be to soften this to a SHOULD, so that in cases  
where it's administratively difficult for people to set a charset  
value, conflicting statements aren't made. I'd rather have the  
metadata be explicitly missing than wrong.

On 08/01/2008, at 3:04 PM, Frank Ellermann wrote:

> Julian Reschke wrote:
> [I've removed the types list, feel free to reinsert it]
>> 1) Do we want HTTP to override RFC2046's defaults at all?
> No.  Overriding it with UTF-8 would make sense (later, not
> in 2616bis).  Let's go back to the 2046 defaults for now.
>> browsers (just tested Opera/Safari/Mozilla/IE7) ignore all
>> three RFCs for at least text/xml (they all look at the
>> content).
> Of course, authors might know what they do, besides browsers
> also have to work with the content behind file and ftp URLs.
> And it's tricky to get this right for HTTP server admins...
>> we can state "in absence of charset parameter recipient MAY
>> do charset sniffing (BOM, XML decl, HTML meta tag, ...),
>> which would probably match what's actually implemented.
> ...HTTP offers a sound mechanism, what browsers do when that
> mechanism is not used could be "out of scope" for this WG.
> Let's say default ASCII, better guesses are no HTTP problem.
> Frank

Mark Nottingham     http://www.mnot.net/
Received on Wednesday, 9 January 2008 03:16:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:44 UTC