- From: Mark Nottingham <mnot@mnot.net>
- Date: Wed, 9 Jan 2008 14:16:42 +1100
- To: Frank Ellermann <hmdmhdfmhdjmzdtjmzdtzktdkztdjz@gmail.com>
- Cc: ietf-http-wg@w3.org
Personally -- I agree; the only sane thing to do here seems to be to remove HTTP defaulting. The simplest thing seems to be to remove this text; > When no explicit charset parameter is provided by the sender, media > subtypes of the "text" type are defined to have a default charset > value of "ISO-8859-1" when received via HTTP. BUT, note the following text: > Data in character sets other than "ISO-8859-1" or its subsets MUST > be labeled with an appropriate charset value. Depending on how you read the context, this would need to be restated as something like: "Media subtypes of the "text" type MUST be labeled with an appropriate charset value." As I think I've said before, requiring this often leads to mislabelling, because (for example) a Web server administrator will set an unrealistic policy like "all of our content is UTF-8", configure headers to suit, forgetting some legacy content on the site that's in a different encoding. My preference would be to soften this to a SHOULD, so that in cases where it's administratively difficult for people to set a charset value, conflicting statements aren't made. I'd rather have the metadata be explicitly missing than wrong. On 08/01/2008, at 3:04 PM, Frank Ellermann wrote: > > Julian Reschke wrote: > > [I've removed the types list, feel free to reinsert it] >> 1) Do we want HTTP to override RFC2046's defaults at all? > > No. Overriding it with UTF-8 would make sense (later, not > in 2616bis). Let's go back to the 2046 defaults for now. > >> browsers (just tested Opera/Safari/Mozilla/IE7) ignore all >> three RFCs for at least text/xml (they all look at the >> content). > > Of course, authors might know what they do, besides browsers > also have to work with the content behind file and ftp URLs. > And it's tricky to get this right for HTTP server admins... > >> we can state "in absence of charset parameter recipient MAY >> do charset sniffing (BOM, XML decl, HTML meta tag, ...), >> which would probably match what's actually implemented. > > ...HTTP offers a sound mechanism, what browsers do when that > mechanism is not used could be "out of scope" for this WG. > Let's say default ASCII, better guesses are no HTTP problem. > > Frank > > -- Mark Nottingham http://www.mnot.net/
Received on Wednesday, 9 January 2008 03:16:52 UTC