Re: Accept-Charset support from Martin J. Duerst on 1996-12-17 (ietf-http-wg@w3.org from October to December 1996)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Tue, 17 Dec 1996 11:34:29 +0100 (MET)
To: Klaus Weide <kweide@tezcat.com>
Cc: Koen Holtman <koen@win.tue.nl>, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com, www-international@w3.org, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <Pine.SUN.3.95.961217111218.245F-100000@enoshima>

On Mon, 16 Dec 1996, Klaus Weide wrote:

> On Sat, 14 Dec 1996, Koen Holtman wrote:
> > Klaus Weide:
> > >
> > >Let's hope so :).  However, with overloading I meant treating 
> > >{Content,Accept}-Language headers (and related HTML tags or attributes)
> > >as carrying character repertoire meaning - which is happening now. 
> >                                             ^^^^^^^^^^^^^^^^^^^^^^
> > 
> > Interesting.  That seems like a very strange thing to do.
> > 
> > Who is doing this and why?  Could you give a pointer?
> 
> Examples where "Language" is treated as carrying charset meaning
> (not just repertoire, but "charset" including encoding):
> 
> Pages that do the poor-man's negotiation of letting the user select
> a "language" manually, than return a page whose charset may vary 
> depending on the language choice.
>    <URL: http://www.alis.com/internet_products/language.en.html>
>    <URL: http://www.accentsoft.com/>
>    <URL: http://www.dkuug.dk/maits/>   
> 
> Another example, which does "real" (automatic) negotiation:
>    <URL: http://www.dkuug.dk:81/maits/summary>
> (For example, with "accept-language: el, en" you get Greek in iso-8859-7
> - even when also sending an "accept-charset" which excludes iso-8859-7.)

What you perceive as overloading and automatic negotiation is just
a side effect of your exagerated assumptions about the server.
You think the server is somehow being intelligent and saying
"well, this guy wants Greek, so I am assuming he will be able to
display iso-8859-7, even if he doesn't say so".
What's probably happening is that the program starts checking languages,
finds that it can meet the request for Greek, then checks what
encodings it has for this document, finds that it only has iso-8859-7,
and sends it out.
So the only thing the server does is giving priority to languages
over encodings. This results from the fact that there is no
specification about relative priority of Accept headers, or
how to combine them. Another server could start with "charset"s,
find that it has some document available in one of the
charsets you specified, and serve it even if it is not in a language
you specified. This does not work with your example, but assume
you specify Polish and Czech and ISO-8859-2, and the server
has the document in Russian and Hungarian, you might get the
document in Hungarian (because it is ISO-8859-2) even if you
will understand more in Russian.

The problem here is not overloading or "automatic" negotiations,
the "problem" is that the Accept headers were designed for
the general case (i.e. you know Greek and have reasonable equipment
to view it), whereas you need transparent negotiation for
the special cases you are considering.

> As for cases where *-Language (or <LANG> etc.) would be used to 
> distinguish between sub-repertoires of Unicode - well I tried to find
> some examples, but couldn't.  Possible reasons are (1) my search was
> not extensive (or systematic) enough, (2) they don't exist [yet],
> (3) there aren't many UTF (or 10646) pages now, (4) there aren't many
> truly multilingual pages now (with more than one language requiring
> more-than-USASCII).  Also the UTF and multilingual pages I found are
> experimental or for demonstration purposes, so they don't really bother
> right now about supporting browsers which might be less endowed - the
> intentions rather seems to be to demonstrate "You need _this_ browser
> [from us] to see this!".  

> My impression that informal overloading of *-Language with charset
> meaning is (for some) regarded as an acceptable practice derives from
> recent messages to the www-international list, were it was argued that
> this is OK because it covers the "common" and "regualar" case - see e.g.
> 
>    <http://lists.w3.org/Archives/Public/www-international/msg00405.html>,
>    <http://lists.w3.org/Archives/Public/www-international/msg00412.html>,
> and more generally
>    <http://lists.w3.org/Archives/Public/www-international/threads.html>.

It's informal, and in many cases just a side-effect.
It is always possible for the server to make some guesses.
No server is required to honor your requests re. language or
"charset". For both, the model is: Check if you have it, if
not, serve something else. For "charset", some servers do
straightforward transcoding, but servers that do language
conversion (translation) or transliteration are a big
exception.

Regards,	Martin.

Received on Tuesday, 17 December 1996 02:37:34 UTC