Re: Non Latin1 charsets (draft-holtman-http-negotiation-00.txt) from Koen Holtman on 1996-03-02 (ietf-http-wg@w3.org from January to March 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Sat, 2 Mar 1996 12:50:43 +0100 (MET)
To: Nickolay Saukh <nms@nns.ru>
Cc: koen@win.tue.nl, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199603021150.MAA03695@wsooti04.win.tue.nl>
Nickolay Saukh:
>
>4.2 Accept-Charset
>
>I think last sentence of first paragraph should be written as
>"The ISO-8859-1 character set can be assumed to be acceptable
>to all user agents.".

There was a long discussion about ISO-8859-1 versus US-ASCII recently,
and I must admit that I did not read all messages in that discussion.
My impression at the end was that most people wanted US-ASCII to stay
as the character set which can be assumed to be acceptable to all user
agents.

> Rationale: per HTTP/1.1 draft
>(section 3.7.1) entity body without explicit charset can be
>US-ASCII only or ISO-8859-1. 

Yes.

>Thus any conforming user agent must
>be able to handle ISO-8859-1.

No, that is not a correct inference.  It would make sense for every
user agent to be able to handle the all entity bodies without explicit
charset, but Section 3.7.1 does not require it.

>4.6 Alternates
>
>Can media-type contain charset? Is this a valid exmaple?
>
>Alternates: {"TheProject.fr.html" 1.0
>      {type "text/html"} {language "fr"}},
>    {"TheProject.en.html" 1.0
>      {type "text/html"} {language "en"}},
>    {"TheProject.ru.html" 1.0
>      {type "text/html;charset=iso-8859-5"} {language "ru"}}
>    ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
>      {type "text/html;charset=koi8-r"} language "ru"}}

Yes.  Contrary to what Daniel DuBois said in this thread, 

   {type "text/html;charset=iso-8859-5"} 

is indeed the way to denote the charset. 

This mirrors use of the Content-Type header, which specifies the MIME
type and optionally the charset.  Note that we do not have a
Content-Charset header, but that we _do_ have an Accept-Charset
header.  I believe that this asymmetry was caused by early versions of
HTTP trying to inherit as much semantics from the MIME specifications.
As far as I know, it is too late to fix it now.

Also, contrary to what Daniel DuBois said,

>    ("/cgi-bin/xlate?koi8-r+TheProject.ru.html" 1.0
>      {type "text/html;charset=koi8-r"} language "ru"}}

is a legal alternate description.  What the anti-spoofing clause (the
origin server restriction) in Section 5.2 of draft-holtman says is
that origin servers may not return this alternate in a preemptive
negotiation response.  This means that, if this alternate is the best
one, the origin server should send a reactive negotiation response,
which causes the client to retrieve the best alternate with a direct
request on /cgi-bin/xlate?koi8-r+TheProject.ru.html.

>5.1 Reactive negotiation
>
>If two alternates are differ by charset only, how
>specify preferred one?

The service author can specify the preferred one using the source
quality factors in the Alternates header:

    {"notpreferred.html" 0.9 {type "text/html;charset=iso-8859-5"}} 
    {"preferred.html"    1.0 {type "text/html;charset=koi8-r"}}

or by the order in which the alternates are listed:

    {"preferred.html"    1.0 {type "text/html;charset=koi8-r"}}
    {"notpreferred.html" 1.0 {type "text/html;charset=iso-8859-5"}} 

So it is up to the service author so decide for you which charset of
the ones you accept would give you the best results.  The decision
made is reflected in the Alternates header.

You, as a user agent user, can not express a preference for one
charset over another, you can only say which ones you can handle.
There are no quality factors in the Accept-Charset header.

This means that the HTTP/1.1 draft spec assumes that if a user agent
puts a charset in its Accept-Charset header, it can handle this
charset perfectly, not just through some lossy on-the-fly filter.  If
anything lossy happens, it must be done at the server side, and be
reflected in the Alternates header.

I don't know if this assumption of being able to handle perfectly all
charsets included in the Accept-Charset header is correct for all
current browsers.  If it is not, we would have to decide if a) the
current browsers need to be improved, or b) the draft spec needs to be
extended.  I would go for a), though I realize that this puts browsers
that don't use a bitmapped screen, like Lynx, in a difficult position.

Koen.
Received on Saturday, 2 March 1996 03:54:21 UTC