- From: Henrik Nordström <henrik@henriknordstrom.net>
- Date: Tue, 05 Feb 2008 15:45:22 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-Id: <1202222722.17924.101.camel@hlaptop>
lör 2008-02-02 klockan 12:46 +0100 skrev Julian Reschke: > The spec currently requires sniffing for "text/plain; > charset=iso-8859-1" and "text/plain; charset=ISO-8859-1", assuming that > those servers that do send an incorrect default content type always send > it with a very specific character set name. It appears that some servers > sometimes ship with other defaults, thus more character sets would need > to be considered > (<http://lists.w3.org/Archives/Public/public-html/2008Jan/0239.html>). > Where do you draw the line? I follow the small crowd who likes what the HTTP rfc says. If the server says something then it's explicit, and sniffing should only be allowed when there is no explicit information to go on. I.e. there is perfectly valid reasons for a server to say that a file is of type text/plain instead of text/html. Having an user agent guess that the content should be displayed as HTML only because it looks like it is HTML is plain wrong. It's equally valid for a server to say it's ISO-8859-1 even if it looks like the content may be UTF-8 as there may be ISO-8859-1 code sequences in there or perhaps the purpose is simply to let the user know how odd it renders when rendered in the wrong characterset. Having user agents actively work around server misconfigurations is just wrong. All this does is delaying getting the actual problem fixed, and moving the burden of getting the problem fixed from the server / webbapp maintainer who caused the problem to the user agent vendors. Once you start going down the road of routinely secondguessing the intentions of the server or webapp then you enter a never ending road, making sure that these problems will stay forever and never get fixed. So to summarise my preferences: Content-Type guessing MAY be performed ONLY and ONLY IF there is no Content-Type specified. (already a MUST level criteria in the RFC) charset parameter guessing MAY be performed ONLY and ONLY IF there is no charset parameter specified. (currently a MUST NOT in the RFC. charset guessing is currently never allowed) Related to this I also support removing the strict default ISO-8859-1 charset from HTTP text/* types, downgrading it to just a mere suggestion that if there is no charset information available then a good guess for the text/* types is ISO-8859-1 for historical reasons. > 4) other type of sniffing > > HTML5 defines other types of sniffing (such as unknown -> PDF) that > aren't covered by these tests, and haven't been discussed within this > thread. Already in the definition of Type. Not much to discuss. "If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource." Regards Henrik
Received on Tuesday, 5 February 2008 14:47:29 UTC