Re: LANG= for character-mapping from Gary Adams - Sun Microsystems Labs BOS on 1996-07-24 (www-international@w3.org from July to September 1996)

From: Gary Adams - Sun Microsystems Labs BOS <gra@zeppo.East.Sun.COM>
Date: Wed, 24 Jul 1996 07:07:24 -0400
To: MOURIK@rullet.LeidenUniv.nl, carrasco@innet.lu
Cc: www-international@w3.org
Message-Id: <199607241107.HAA03976@zeppo.East.Sun.COM>
> Date: Wed, 24 Jul 1996 11:00:31 +0200 (MET DST)
> From: "M.T. Carrasco Benitez" <carrasco@innet.lu>
> Subject: Re: LANG= for character-mapping
> 
> 1) This is what I assume from the current proposals:
> 
> - Only one charset in allowed per document.

Specifically, the HTML portion of a document (which is an SGML 
application) is restricted to a single document character set.
Documents have a variety of components embedded within them.
Images, sound, executable content, etc. may have other 
internationalization considerations as well as though needed for
HTML rendering.

> 
> - The "document character set" should Unicode; other are allowed.
> 
> - The charset for transmission should be Unicode; other are allowed.
> 
> - The server should inform the client; charset = "UNICODE-1-1" (no sniffing).

These three "just use unicode" guidelines should be directed to the 
web authoring tool vendors more than any other specific community.
The volume of Unicode encoded documents will be the main thing
driving browser vendors to I18N compliance.

> 
> - Transmissions transformations are for compressing, encrypting
>   (content-encoding) or "safe transport" (transfer-coding); but virtually
>   what it is sent is the charset.
> 

> - LANG is for higher functions, such as short quotations.
> 
> - The server should inform the client with Content-Language.
> 
> - LANGs in the document overrides the Content-Language.

I'd also add that the "end user" is an important part of the 
equation for culturally correct handling of i18n documents.
The user is running on a localized platform of some sort
and has selected language preferences. The user has local
display and printing capabilities. (I've recently run into 
a problem locally that not all displayed documents are 
acceptable to my local printer. This means I might have to
read jp or zh online, but print fr or en_US for harcopy.)

> 
> - There is no association between LANG and charset.
> 
> 
> 2) HTTP needs some changes/clarifications
> - Accept-Language
> This should be the ordered list of "prefered languages".
> 
> The meaning of the quality factor "q" should be changed
>  from  "...estimate of the user's comprehension of that language..."
>  to    "minimun accepatable quality of the translation"

The "user's comprehension" may be a fine way for a browser vendor
to specify global preferences of an individual user, but when a document
is requested over the wire the dialog should be about the "document"
not the user. Even beyond language and dialect issues, I might have 
different needs within specific corpus domains. e.g. might as well
use Latin for the "medical terminology" when index home remedy sites.

> 
> - Content-Language
> This should be an ordered list; the first language should be the language 
> of the document transmited; the rest, the languages available.

It would be great if the returned languages available were labeled with
"quality factor" information and if the accept-language request 
could threshold what was returned, e.g. allow up to 'n' additional supported
languages or show all additional languages with q=.5 or higher.
Client side and server side translation assistance is not that far away.
Also applications building indexes of the complete web may want to index
the higher quality documents, but capture a complete set of document
summaries for a wider coverage of search interface users.

> 
> 
> 3) HTTP should allow two type of conversations:
> 
> - Request the "best" language
>   Client: Send MyDoc with order of preference Danish, English, German.
>   Server: Take MyDoc in German; it is available in German, Italian, Spanish.
> 
> - Request one specific language
>   Client: Send MyDoc only in Spanish
>   Server: Take MyDoc in Spanish; it is available in German, Italian, Spanish.
> 

Has anyone defined the "quality factor" definition for Content-Language
yet? q=1.0 original doc q=.9 profesional translation ... q=.5
machine translated sentences, ... q=.1 glossary assistance ...?

> 
> Regards
> Tomas
>
Received on Wednesday, 24 July 1996 07:07:47 UTC