Re: Accept-Charset support

On Sun, 8 Dec 1996, Jonathan Rosenne wrote:

> At 20:17 07/12/96 +0100, Drazen Kacar wrote:
> >Klaus Weide wrote:
> >> On Sat, 7 Dec 1996, Jonathan Rosenne wrote:

[ Example with hr snipped ] 
> When you set the accept language to hr, you could also set the fonts to
> whatever is needed to show Serbo-Kroat in the Latin script.
> 
> >> Well that is what I don't like - "it can be reasonably assumed" means
> >> guessing is involved.  Ideally Acccept-Language should express what
> >> human language I want, no more and no less.  There is no good reason
> >> why I shouldn't be able to express "I want that text in Russian, but
> >> have only Latin2 characters".
> 
> Yes there is - it is not a common or regular way to see Russian, and the
> standards need to cater first and foremost for the common and regular and
> only secondly to special needs such as those. And I am not sure that this
> special need is appropriate for standardization at all.

I don't think the standards ought to cater only to the current needs of
the majority.  There can be a concept or a vision behind them (the whole
Web thing started out that way), which I would hope to be more stable than
the questions "What's easiest to implement this year" or "What do most
people want right now".

The specific example given above may seem too unusual for you to consider,
but nobody is asking for specific standardization for this case.  But I do
mind if useful capabilities in the protocol are becoming unusable for
their intended purpose because they are (mis)used for some other purpose.
If there are insufficient means in the architecture to do character set
labelling and negotiation then that should be fixed right *there* (by
registering needed charsets or revising the MIME charset syntax or
whatever).   

HTTP 1.1 says "A language tag identifies a natural language spoken,
written, or otherwise conveyed by human beings for communication of
information to other human beings."  Nearly identical words  are in
draft-ietf-html-i18n-05.txt.  A language tag[*] is not bound to a
specific language in either of those drafts.  I argue to keep it that
way.

[* BTW not a tag in the HTML sense]

> It is hardly a wild guess that if the language is French the characters to
> be used include the French characters. 

It will work in most cases (or: for a majority of current Web users),
and I don't mind such guesses if no better information is available.
The fact remains that it is a guess, and better information should be
available.

> >Even more than that. I can read Cyrillic when I have to, but it's a hard
> >going. I'm not sure I can read handwriting at all. But, my understanding
> >of Serbian is q=1.0 if written in Latin alphabet. The official alphabet
> >there is Cyrillic and it's reasonable to expect the pages will use it.
> >I *want* it converted to Latin by my browser, even if I have fonts around.
> >Why should I have those fonts? To write a word or two and put it in
> >headings on some pages. You know, tho c00l stuff. :)
> 
> The case of Serbo-Kroat is unique. It can be handled by the browser through

I don't think it is the only language which is habitually written in
more than one script.

> a special mapping table or applet that maps Cyrillic UCS codes to Latin.
> Someone in Greece may desire a similar mapping to Greek. This is an
> implementation issue and not a matter for standards (except to standardize
> the mapping). In any case, it does not require any involvement of the
> server, which should still provide Cyrillic text.

It could be done by the server, or on the client side, or by some
intermediate agent (like a translation proxy/gateway).  The standards
should not prevent the emergence of new services that would otherwise
fit into the framework.

   Klaus

Received on Sunday, 8 December 1996 01:28:47 UTC