[Prev][Next][Index][Thread]

Re: Accept-Charset support



On Sat, 7 Dec 1996, Jonathan Rosenne wrote:

> At 03:24 07/12/96 -0600, Klaus Weide wrote:
> >On Thu, 5 Dec 1996, Larry Masinter wrote:
> >> I think the simple thing to do is to send:
> >> 
> >> 	accept-charset: utf-8,iso-8859-5
> >> 
> >> if you're a browser and can display utf-8 and 8859-5 as well as
> >> 8859-1.  
> >
> >It seems more appropriate to say "...if you can decode utf-8 and display
> >8859-5".  The problem is that "utf-8" doesn't carry any useful information
> >about available character repertoire (whereas iso-8859-5 does) unless
> >we assume that it will be normal for a browser (or other web client)
> >to have _all_ of the 10646 characters available (in which case all 
> >discussion about Accept-Charset would be rather pointless).
> 
> According to the 10646 and Unicode specifications, the user agent is not
> obliged to be able to display all the characters. 

Of course not, but ("if I am a browser") I may wish to express better
what I am capable of.
  
> >If there is a need for a client to express "I can understand UTF-8,
> >but can only display some of the 10646 characters: ..." - and I 
> >definitely think there is such a need - I don not see a way to implement
> >this cleanly.  This is a limitation of the MIME charset model which
> >mixes character encoding and repertoire aspects ("charset considered
> >harmful" etc...).  Or rather it is a limitation following from the fact
> >that no more than a handful of "10646 sub-repertoire charsets" have
> >been registered, for which the IANA registry file has reserved a range:
> >
> > "The second region (1000-1999) is for the Unicode and
> >ISO/IEC 10646 coded character sets together with a specification of a
> >(set of) sub-repetoires that may occur."
> 
> 10646 does define several subsets. They appear not to have been registered
> by IANA. They are language related, rather than vendor related.

Well somebody has to register them (IANA won't do that by themselves).
And judging from what is in the registry, only one vendor has taken
such charsets (sub-repertoires) serious enough to register some of them.

> The best solution to the problem raised is via "accept-language". It can be
> reasonably assumed that if my preferred languages include French I can
> display the French characters. 

Well that is what I don't like - "it can be reasonably assumed" means
guessing is involved.  Ideally Acccept-Language should express what
human language I want, no more and no less.  There is no good reason
why I shouldn't be able to express "I want that text in Russian, but
have only Latin2 characters".  There may be servers that provide 
transcriptions, and maybe that will be done automatically in the future.
Or consider languages that can be written in more than one script.
If "I understand language xx" (or just "I want to see *this* document in
the xx language version") is taken to mean "Oh that guy has the yyy
charset" (maybe based on unwarranted assumptions), it follows that I 
have no way to get across "I want this in language xx, but please *don't*
give me charset yyy."

Overloading the meaning of accept-language doesn't seem to be a good 
thing.  It may look like a step forward now, because it's a way to
express capabilities that works in most cases.  But if what I really
want to tell is about charsets, I shouldn't be forced to say it in terms
of natural languages.  

> If the server only has it in Japanese, it will be sent in Japanese, my
> screen will be illegible just as it is today, so the situation in this case
> will not improve but will not be worse. If the server does have alternative
> languages, the situation will improve. In total, the two accept- features
> represent a great improvement because they allow a much better situation
> than currently available if the parties support them and don't make it worse
> for those who do not.

Of course those accept- features have been around in drafts for years
yet not many have taken them seriously... 


Follow-Ups: References: