- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Thu, 27 Jun 1996 03:05:23 PDT
- To: fielding@liege.ICS.UCI.EDU
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Roy, The rough draft minutes didn't cover the full discussion. 1. WHY CHANGE draft...-05? The primary observation was that draft-05 introduced an INCOMPATIBILITY with HTTP/1.0 in that it changed the *meaning* of a response in an incompatible way, and with a severe loss of functionality. In HTTP/1.0, in order to reflect current practice, untagged text <<content-type: text/html>> is interpreted as "charset is unspecified, recipient must guess". We added language to change the meaning of this, and this language was incompatible with 1.0: > The "charset" parameter is used with some media types to define the > character set (section 3.4) of the data. When no explicit charset > parameter is provided by the sender, media subtypes of the "text" type > are defined to have a default charset value of "ISO-8859-1" when > received via HTTP. Data in character sets other than "ISO-8859-1" or its > subsets MUST be labeled with an appropriate charset value. This language is not only incompatible with HTTP/1.0, it is not in conformance with what we believe will be future directions for other Internet protocols; there is no reason to place ISO-8859-1 in this position in HTTP. Furthermore, there is no recommended way to actually specify what is the default situation with HTTP/1.0, which is that the charset is not known. So, these are sufficient reasons to consider a change to the -05 specification. 2. COMPATIBILITY WITH HTTP/1.0 The issue concerns the labelling of the charset of text/ entity bodies in HTTP/1.1 messages. In HTTP/1.1 _response_ messages, it is possible, and will be recommended implementation advice, that for graceful deployment a server might respond differently to a HTTP/1.0 request and a HTTP/1.1 request. As you say, "there is nothing in HTTP that prevents a site, if it so desires, from tagging all text types with an appropriate charset parameter". However, HTTP/1.1 implementations must be prepared to deal with an explicit charset parameter. In the case of labelling HTTP requests as opposed to responses, the version of the server may not be known. However, the issue concerns only the charset label on an entity body of type "text" in requests, and generally only PUT and POST are sent with entity bodies in HTTP/1.1. POST requests are generally not sent with a content-type of text (application/x-url-encoded being most common) and PUT is generally only practiced between proprietary clients and their corresponding servers. So it was believed that there was not a compatibility issue with current practice in requiring that all entity bodies be labelled with their charset. 3. HTTP/1.1 <-> HTTP/1.0 gateways We discussed the issue of what a HTTP/1.1 proxy might do with an entity body that was recieved from a HTTP/1.0 server without a charset label. In general, it is deemed more reliable to not have "no label" have a special meaning that cannot be otherwise represented. Other Internet protocols use "charset=x-unknown" to represent the situation where the character set was otherwise unknown. This seemed like a reasonable practice to recommend to gateways. 4. Upgrading CGI & programs to HTTP/1.1 We discussed how current servers that were implementing HTTP/1.1 but not upgrading CGI programs might label their data. It seemed reasonable to assume that at a given site, if the CGI program did not itself supply a charset parameter for the content-type of the return value, the server might supply one itself based on the system default. 5. MUST vs. SHOULD In the end, there was a choice: a) charset SHOULD be supplied with all responses no label means "US-ASCII superset, you guess" (I think this would be equivalent to changing "ISO-8859-1" to "US-ASCII" in the draft) b) charset MUST be supplied with all responses explicit "charset=x-unknown" if that's the case. I believe choice (b) was acceptable to everyone in the room, including HTTP/1.1 client and server implementors. The two choices are practically the same except that choice (b) will promote the more frequent use of an explicit "charset=x-unknown" for content where that is the case. Neither choice would seem to cause compatibility difficulties with HTTP/1.0 clients or servers given a few precautions in servers and version gateways.
Received on Thursday, 27 June 1996 03:14:41 UTC