- From: Gary Adams - Sun Microsystems Labs BOS <gra@zeppo.East.Sun.COM>
- Date: Wed, 24 Jul 1996 07:07:24 -0400
- To: MOURIK@rullet.LeidenUniv.nl, carrasco@innet.lu
- Cc: www-international@w3.org
> Date: Wed, 24 Jul 1996 11:00:31 +0200 (MET DST) > From: "M.T. Carrasco Benitez" <carrasco@innet.lu> > Subject: Re: LANG= for character-mapping > > 1) This is what I assume from the current proposals: > > - Only one charset in allowed per document. Specifically, the HTML portion of a document (which is an SGML application) is restricted to a single document character set. Documents have a variety of components embedded within them. Images, sound, executable content, etc. may have other internationalization considerations as well as though needed for HTML rendering. > > - The "document character set" should Unicode; other are allowed. > > - The charset for transmission should be Unicode; other are allowed. > > - The server should inform the client; charset = "UNICODE-1-1" (no sniffing). These three "just use unicode" guidelines should be directed to the web authoring tool vendors more than any other specific community. The volume of Unicode encoded documents will be the main thing driving browser vendors to I18N compliance. > > - Transmissions transformations are for compressing, encrypting > (content-encoding) or "safe transport" (transfer-coding); but virtually > what it is sent is the charset. > > - LANG is for higher functions, such as short quotations. > > - The server should inform the client with Content-Language. > > - LANGs in the document overrides the Content-Language. I'd also add that the "end user" is an important part of the equation for culturally correct handling of i18n documents. The user is running on a localized platform of some sort and has selected language preferences. The user has local display and printing capabilities. (I've recently run into a problem locally that not all displayed documents are acceptable to my local printer. This means I might have to read jp or zh online, but print fr or en_US for harcopy.) > > - There is no association between LANG and charset. > > > 2) HTTP needs some changes/clarifications > - Accept-Language > This should be the ordered list of "prefered languages". > > The meaning of the quality factor "q" should be changed > from "...estimate of the user's comprehension of that language..." > to "minimun accepatable quality of the translation" The "user's comprehension" may be a fine way for a browser vendor to specify global preferences of an individual user, but when a document is requested over the wire the dialog should be about the "document" not the user. Even beyond language and dialect issues, I might have different needs within specific corpus domains. e.g. might as well use Latin for the "medical terminology" when index home remedy sites. > > - Content-Language > This should be an ordered list; the first language should be the language > of the document transmited; the rest, the languages available. It would be great if the returned languages available were labeled with "quality factor" information and if the accept-language request could threshold what was returned, e.g. allow up to 'n' additional supported languages or show all additional languages with q=.5 or higher. Client side and server side translation assistance is not that far away. Also applications building indexes of the complete web may want to index the higher quality documents, but capture a complete set of document summaries for a wider coverage of search interface users. > > > 3) HTTP should allow two type of conversations: > > - Request the "best" language > Client: Send MyDoc with order of preference Danish, English, German. > Server: Take MyDoc in German; it is available in German, Italian, Spanish. > > - Request one specific language > Client: Send MyDoc only in Spanish > Server: Take MyDoc in Spanish; it is available in German, Italian, Spanish. > Has anyone defined the "quality factor" definition for Content-Language yet? q=1.0 original doc q=.9 profesional translation ... q=.5 machine translated sentences, ... q=.1 glossary assistance ...? > > Regards > Tomas >
Received on Wednesday, 24 July 1996 07:07:47 UTC