Re: proposed HTTP changes for charset

> I suggest making the following change, which is less controversial
> than the "charset=unknown" proposal:
> 
> Current HTTP/1.1 spec:
> 
>> The "charset" parameter is used with some media types to define the
>> character set (section 3.4) of the data. When no explicit charset
>> parameter is provided by the sender, media subtypes of the "text" type
>> are defined to have a default charset value of "ISO-8859-1" when
>> received via HTTP. Data in character sets other than "ISO-8859-1" or its
>> subsets MUST be labeled with an appropriate charset value.
> 
> My proposal:
> 
> < The "charset" parameter is used with some media types to define the
> < character set (section 3.4) of the data. Origin servers SHOULD
> < include an appropriate charset parameter for those media types which
> < allow one (including text/html and text/plain) to avoid ambiguity.
> < In the absence of a charset parameter, the default charset value MAY
> < be assumed to be "ISO-8859-1" when received from a HTTP/1.1 server.
> 
> < Unfortunately, some HTTP/1.0 clients do not properly deal with
> < explicit charset parameters for text/html data, and some HTTP/1.0
> < server sites send no charset parameter, even when the charset of the
> < data is not ISO-8859-1. For compatibility with older clients and
> < servers, implementations may need to be careful when communicating
> < with older versions, by not sending a charset parameter when the
> < data is ISO-8859-1, and by allowing local configuration when
> < recieving unlabelled data from HTTP/1.0 servers.
> 
> This establishes a convention that charset SHOULD be sent, but lays
> out some of the compatibility constraints during the transition
> period. Is this sufficient?

I would not loosen the existing requirement for HTTP/1.1 that
character sets other than "ISO-8859-1" or its subsets MUST be labeled.
In other words

====================

  The "charset" parameter is used with some media types to define the
  character set (section 3.4) of the data. To avoid ambiguity, an origin
  server SHOULD include an appropriate charset parameter for those media
  types which allow one (including text/html and text/plain), and MUST
  do so if the character set is other than "ISO-8859-1" or its subsets.
  In the absence of a charset parameter, the default charset value MAY
  be assumed to be "ISO-8859-1".

     Note: Some older HTTP/1.0 user agents do not properly understand
     explicit charset parameters for text/html data, and some HTTP/1.0
     server sites send no charset parameter even when the charset of the
     data is not ISO-8859-1. Although such applications are considered
     to be broken and should be replaced, HTTP/1.1 implementations may
     need to adjust their behavior to compensate for such older systems.
     For example, an origin server may wish to avoid sending the charset
     parameter when the data is US-ASCII or ISO-8859-1 and the User-Agent
     request-header field indicates that such an older user agent made
     the request.  Likewise, user agents may wish to allow local
     configuration to override HTTP's default charset when no charset
     parameter is present in received data.

     Note: The reason for "ISO-8859-1" being the default value when no
     charset parameter is provided is due to current practice and should
     not be interpreted as any sort of preference for that character set.

====================

However, I still believe that the current specification is more accurate,
and less likely to me misinterpreted, than the above compromise.  The
problem is that the above is an implicit recommendation that applications
avoid being unconditionally compliant with the protocol, at least for
the next year or so.

 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/

Received on Wednesday, 3 July 1996 13:17:39 UTC