Re: Accept-Charset support

On Dec 5, 12:42pm, Martin J. Duerst wrote:

> > Larry Masinter wrote:

> > > If no Accept-Charset header is present, the default is that any
> > > character set is acceptable. If an Accept-Charset header is present, and
> > > if the server cannot send a response which is acceptable according to
> > > the Accept-Charset header, then the server SHOULD send an error response
> > > with the 406 (not acceptable) status code, though the sending of an
> > > unacceptable response is also allowed.

That implies that sending

Accept-Charset: utf-8

Should generate a 406 response if the document is only available in, say,
Latin-1 and the server cannot convert that to UTF-8. Using the
recently-introduced three level terminology, with

level 1 = unicode-1-1 (ie UCS-2) and utf-8 (ie unicode-1-1-utf-8)
level 2 = 8859-1, 8859-2, 8859-5, 8859-7 (etc, to be discussed)
level 3 = all the rest

This implies that the Accept-Charset should be all the level 1 and
level 2 charsets that the user wishes to accept. The server is then
allowed, according to HTTP/1.1, to send back the resource in one of
the listed charsets, or to send back an unacceptable response (does
that mean a 406 with a message body in an unacceptable charset, Larry,
or does it mean a 200 response with a message body in an unacceptable
charset?).

Thus for example I might send out an Accept-Charset of 8859-1 because
that covers all the languages I can actually read. A Russian user might
send out an Accept-Charset of 8859-5, KOI-8 and UTF-8, meaning that
documents in unusual languages like Japanese or English would be sent
out in UTF-8 (or, bizzarely but legally, in KOI-8 with numeric character
references ;-)

> Does the 406 response include some information on what "charset"s would
> indeed be acceptable?

It seems not, although another option is to send a 300 response code which
does (can) include such information.

Equally, another option is to send no Accept-Charset and get a 300
response, and always have to incurr another round-trip delay to
fetch a more acceptable option.

(All charsets are equal, but some are more equal than others, to
paraphrase Orwell)

However, would it be wise to assume that all browsers that claim to
be using HTTP/1.1 also support UTF-8? Probably not.

10.3.1 300 Multiple Choices

The requested resource corresponds to any one of a set
of representations, each with its own specific
location, and agent-driven negotiation information
(section 12) is being provided so that the user (or
user agent) can select a preferred representation and
redirect its request to that location.

Unless it was a HEAD request, the response SHOULD
include an entity containing a list of resource
characteristics and location(s) from which the user or
user agent can choose the one most appropriate. The
entity format is specified by the media type given in
the Content-Type header field. Depending upon the
format and the capabilities of the user agent,
selection of the most appropriate choice may be
performed automatically. However, this specification
does not define any standard for such automatic
selection.

If the server has a preferred choice of representation,
it SHOULD include the specific URL for that
representation in the Location field; user agents MAY
use the Location field value for automatic redirection.
This response is cachable unless indicated otherwise.


-- 
Chris Lilley, W3C                          [ http://www.w3.org/ ]
Graphics and Fonts Guy            The World Wide Web Consortium
http://www.w3.org/people/chris/              INRIA,  Projet W3C
chris@w3.org                       2004 Rt des Lucioles / BP 93
+33 (0)4 93 65 79 87       06902 Sophia Antipolis Cedex, France

Received on Thursday, 5 December 1996 13:11:33 UTC