- From: Roy T. Fielding <fielding@liege.ICS.UCI.EDU>
- Date: Wed, 03 Jul 1996 12:43:31 -0700
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: jg@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
> I suggest making the following change, which is less controversial > than the "charset=unknown" proposal: > > Current HTTP/1.1 spec: > >> The "charset" parameter is used with some media types to define the >> character set (section 3.4) of the data. When no explicit charset >> parameter is provided by the sender, media subtypes of the "text" type >> are defined to have a default charset value of "ISO-8859-1" when >> received via HTTP. Data in character sets other than "ISO-8859-1" or its >> subsets MUST be labeled with an appropriate charset value. > > My proposal: > > < The "charset" parameter is used with some media types to define the > < character set (section 3.4) of the data. Origin servers SHOULD > < include an appropriate charset parameter for those media types which > < allow one (including text/html and text/plain) to avoid ambiguity. > < In the absence of a charset parameter, the default charset value MAY > < be assumed to be "ISO-8859-1" when received from a HTTP/1.1 server. > > < Unfortunately, some HTTP/1.0 clients do not properly deal with > < explicit charset parameters for text/html data, and some HTTP/1.0 > < server sites send no charset parameter, even when the charset of the > < data is not ISO-8859-1. For compatibility with older clients and > < servers, implementations may need to be careful when communicating > < with older versions, by not sending a charset parameter when the > < data is ISO-8859-1, and by allowing local configuration when > < recieving unlabelled data from HTTP/1.0 servers. > > This establishes a convention that charset SHOULD be sent, but lays > out some of the compatibility constraints during the transition > period. Is this sufficient? I would not loosen the existing requirement for HTTP/1.1 that character sets other than "ISO-8859-1" or its subsets MUST be labeled. In other words ==================== The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. To avoid ambiguity, an origin server SHOULD include an appropriate charset parameter for those media types which allow one (including text/html and text/plain), and MUST do so if the character set is other than "ISO-8859-1" or its subsets. In the absence of a charset parameter, the default charset value MAY be assumed to be "ISO-8859-1". Note: Some older HTTP/1.0 user agents do not properly understand explicit charset parameters for text/html data, and some HTTP/1.0 server sites send no charset parameter even when the charset of the data is not ISO-8859-1. Although such applications are considered to be broken and should be replaced, HTTP/1.1 implementations may need to adjust their behavior to compensate for such older systems. For example, an origin server may wish to avoid sending the charset parameter when the data is US-ASCII or ISO-8859-1 and the User-Agent request-header field indicates that such an older user agent made the request. Likewise, user agents may wish to allow local configuration to override HTTP's default charset when no charset parameter is present in received data. Note: The reason for "ISO-8859-1" being the default value when no charset parameter is provided is due to current practice and should not be interpreted as any sort of preference for that character set. ==================== However, I still believe that the current specification is more accurate, and less likely to me misinterpreted, than the above compromise. The problem is that the above is an implicit recommendation that applications avoid being unconditionally compliant with the protocol, at least for the next year or so. ...Roy T. Fielding Department of Information & Computer Science (fielding@ics.uci.edu) University of California, Irvine, CA 92717-3425 fax:+1(714)824-4056 http://www.ics.uci.edu/~fielding/
Received on Wednesday, 3 July 1996 13:17:39 UTC