Re: Translations from Martin J. Duerst on 1997-01-16 (www-international@w3.org from January to March 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Thu, 16 Jan 1997 19:13:00 +0100 (MET)
To: "M.T. Carrasco Benitez" <carrasco@innet.lu>
cc: www-international@www10.w3.org
Message-ID: <Pine.SUN.3.95.970116185254.245D-100000@enoshima>

On Tue, 14 Jan 1997, M.T. Carrasco Benitez wrote:

> 1) Defining a nomenclature that allows for translation cost little to 
> HTTP and could be very useful in translation.  Example:
> 
>  it-ht   (Italian, human translation)
>  it-mt   (Italian, machine translation)

This may be nice in some cases. But it should not be mandatory.

> 2) The response to a translation request by machine or human would not be 
> instantaneous.  Further work would be needed for longer transactions, 
> probably applicable to other fields.

Does HTTP have something like a delay? Can a server/proxy send back
"not available, but possibily available in 2 weeks"?

> 3) "q" should be the "quality of the linguistic version" and not the 
> "user's preference for the language" (HTTP/1.1).  Example
> 
>   q=1    Translated by a human   Master Translator
>   q=0.5  Translated by a human   Novice Translator
>   q=0.49 Translated by a machine Master Translator

The q for the documents is quality of the documents. The q on
the Accept-Language is the preference of the user.

> 4) A standard nomenclature for "q" is need.  For example, less than 0.5 
> is machined translations.

This would be widely contraproductive. The only restrictions we have
is that we work with multiplication, so 0 is absolutely zero (and
while there may be some need to specify a preference with q=0.0,
a document with q=0 does not make any sense), and a server has to
scale the q values for the calculation in a single query so that
they fit into the 0..1 range. (note that scaling over the whole
server is not needed, q values are relative to queries).

As an example, consider (with serverwide scaled q values), a
server only serving weather reports. Original weather reports
will have q=1.0, translated ones maybe q=0.9, because machine
translation of weather reports is quite reliable.
On the other hand, consider a server for literate works. It will
rate the literate works (novels, poems,...) itself as 1.0
(maybe not all of them :-), the general pages, lower in quality,
maybe as 0.7, translations of general pages as 0.3, and machine
translations of literary works maybe even as 0.1.

> 5) The Accept-Language should be a ordered "preference list".  There is no 
> need to quantify the preference of the user.

Just to the contrary. If e.g. you know English, German, and Japanese,
how do you express that you know Japanese almost as good as the others,
or just a little bit? Depending on that, the documents you would
prefer to be returned can differ greatly.

The problem with q on Accept-Language is privacy. One part of this
problem is the identification with some language minority, which
may be done independently of q factors. The other is click tracing.
For this, in certain cases even just the set of languages provides
enough information. To alleviate the problem of click-tracing and
privacy, in addition to the provisions in the http specs, it might
be a good idea to agree to restrict the q values set by browsers
to a limited set (e.g. 1.0, 0.8, 0.6, 0.4, 0.2, 0.0). This will
allow a wide expression of relative preferences, while it will
avoid click-tracing on something like "the guy that has Japanese
at 0.4586794".

Regards,	Martin.

Received on Thursday, 16 January 1997 13:13:23 UTC