- From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
- Date: Sat, 10 Feb 1996 22:10:07 -0800
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> Are we converging? As far as I know. > I don't like 'treat the media type as if the unrecognized parameter > and its value were not present' any better, for the same reason: a > forwarding agent probably shouldn't toss unrecognized parameters. I > don't think it's really clear what MIME-IMB means by it, and I don't > see what it adds. It is a "you should've figured this one out already, but too many implementors have already screwed up on it" kind of sentence. Given that Mosaic was the worst culprit, I'd like to see it in the spec. How about: Upon receipt of a media type with an unrecognized parameter, a user agent should treat the media type as if the unrecognized parameter and its value were not present. > ================================================================ > Re the ASCII vs 8859-1 default: I think we've just been browsing > different pages. It seems the problem was the escape of Mosaic-l10n, > and the use of different font code sets for Russian, Greek, etc. Yes, I know about the problem, but we went to great lengths to convince the Mosaic-l10n crowd and all other developers that an explicit charset parameter is the right way to do what they were doing. They agreed, and now current practice is that no charset again implies ISO-8859-1 and, aside from closed systems where some other default is known to the user, a charset is needed to switch character sets. >> I don't see how. All WWW software defaults to ISO-8859-1 as per the >> original design of the Web. > > ?? Mosaic-l10n was pretty popular. Yes, but not as popular as all the systems that assume ISO-8859-1 and do not allow any changing of that default. >> That is true of libwww, libwww-perl, the >> Python libraries, Mosaic, NCSA httpd, Apache httpd, Spyglass Mosaic, >> MS Internet Explorer, and Netscape Navigator. > > I was thinking of 'servers' not 'clients'. Yes, most ISO-8859-1 > clients assume ISO-8859-1. When the Apache server sends text/html, it is intending to say text/html;charset="iso-8859-1". Users are capable of changing the charset parameter on documents served by Apache. What spawned Mosaic-l10n was the belief that it was easier to hack the clients to guess a charset than it was to both fix the clients to recognize charset and fix the older NCSA server (which had no knowledge of media type parameters of any kind) to send the charset parameter. That reasoning is no longer valid. Since we have fixed both the clients and the servers to do the right thing, there's no point in saying that doing the wrong thing is current practice. >> I disagree -- all current practice that is not known to be broken >> is saying charset="ISO-8859-1" if no charset is present. That is >> in the definition of the HTTP protocol. > > It's really hard to find a web server there that *doesn't* use > whatever-ya-got as the character encoding. If it were just a few here > and there, I'd go along with you, but when it's all over, it's hard to > swallow saying "oh, they're just broken" and have it apply to > www.*.ru, www.*.jp, www.*.gr, www.*.kr, www.*.cn etc. > > How about: > > # The "charset" parameter is used with some media types to define > # the character set (Section 3.4) of the data. When no explicit > # charset parameter is provided by the sender, media subtypes of the > # "text" subtype are defined to have a default charset value of > # "ISO-8859-1" when received via HTTP. However, currently many web > # servers ignore have ignored this specification, and provide data > # using other charsets but without proper labelling. To compensate > # for this, some HTTP user agents provide a configuration option to > # allow the user to change the default interpretation of the media > # type character set when no charset parameter is given. This > # situation reduces interoperability. It is recommended servers that > # provide text in character streams other than ISO-8859-1 should > # label the data appropriately. > > This both promotes the 'recommended' behavior and also tells the > situation. No, it mixes things up. If you want to say that, the correct thing to do is: The "charset" parameter is used with some media types to define the character set (Section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" subtype are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets must be labelled with an appropriate charset value in order to be consistently interpreted by user agents. Note: Many current HTTP servers provide data using charsets other than "ISO-8859-1" without proper labelling. This situation reduces interoperability and is not recommended. To compensate for this, some HTTP user agents provide a configuration option to allow the user to change the default interpretation of the media type character set when no charset parameter is given. Which makes it clear what the protocol is versus what implemetation kludges are used. > ================================================================ > > I don't really care about most of the rest of the issues. I still > don't know why you want to say "origin server" when "server" will do, Because the security note doesn't apply to all servers -- only origins. > or [tm] on Unix and Windows when no one else does, but I don't care > much. Well, I don't care about that either -- I just wanted to bring it to your attention. > And I'll go along with calling out "content-" headers specially. > Are we converging? Yep. ...Roy T. Fielding Department of Information & Computer Science (fielding@ics.uci.edu) University of California, Irvine, CA 92717-3425 fax:+1(714)824-4056 http://www.ics.uci.edu/~fielding/
Received on Saturday, 10 February 1996 22:18:11 UTC