W3C home > Mailing lists > Public > public-html@w3.org > August 2008

Re: Translation control in HTML5

From: Leif Halvard Silli <lhs@malform.no>
Date: Fri, 01 Aug 2008 04:17:10 +0200
Message-ID: <48927226.9010202@malform.no>
To: Karl Dubost <karl@w3.org>
CC: Ian Hickson <ian@hixie.ch>, public-html WG <public-html@w3.org>, Chris Wilson <Chris.Wilson@microsoft.com>, simonp@opera.com, Martin Duerst <duerst@it.aoyama.ac.jp>

Karl Dubost 2008-08-01 02.42:

> Le 1 août 2008 à 07:05, Ian Hickson a écrit :
>> We define lang, we can easily define it as being something akin to:
>>  [only] <bcp-47-code>
> If we were doing that, we would have to be sure to not break existing 
> applications. I wonder if I18N activity has a list of apps implementing 
> lang.
> Though lang="" is not very good for this purpose, for brand names, 
> people name, trademarks, etc. Let's take Word, the name of the program. 
> You don't want it to be translated or an English person called Schwartz. 
> or "小林" (kobayashi) = little wood which is a common Japanese name.
> My natural inclination would be to reuse the vocabulary of ITS to not 
> reinvent the wheel.

Very interesting from Chris! But I side with Ian in that LANG 
could be interesting to reuse. However, I also side with Karl in 
that reuse in the form of *messing* with LANG is bad. I also liked 
Simon Pieter's proposal about using META to "cascade" the 
notranslate values to the document:

   <meta name=notranslate content="code, #logo, .term, :lang(de)">

FIRST, the good news: BCP 47 perhaps has a way out so we can reuse 
LANG without messing with it. BCP 47 offers the possibility of 
registering language tag extensions with IANA. Such extensions are 
added after the "real" language codes. Thus, simply put, if we had 
registered with IANA e.g. a -q- singleton (q for quality), then 
one could tag something like this (the exact values must be 
registered with IANA):

    <span lang="en-q-notTranslate">Word</span>
    <span lang="en-q-original">Word</span>
    <span lang="en-q-name">Word</span>

The current draft of IETF 4646 says:

"Extensions [...] are intended to identify information which is 
commonly used in association with languages or language tags, but 
which is not part of language identification." [1]

THIRDLY, and back to Simon Pieters: Going the route via BCP 47 has 
the advantage that we get "something" which is useful both inside 
the META tags (as Simon proposed it) as well as in LANG attributes 
and even in the REL attributes. Just think about the 
rel="alterntate" attribute. According to HTML 4 (and I hope HTML 
5), with

   <link lang=fr rel=alternate href=text.french.htm >,

  we are pointing to a French alternate - and translation - of the 
current document. However, there is nothing which tells you 
wehther the document you are reading or the linked document makes 
up the original document.

For this, I imagine that one could also register -q-original, so 
that one could have

   <link lang=fr-q-original rel=alternate href=text.french.htm >

And also, this way one could solve the problem which Chris asked 
Simon about, namely, let's say you want only some designated parts 
of the German parts of your text to be translated, then you could 
solve that this way:

   <meta name=notranslate content=":lang(de-q-original)">

The translate="yes/no" attribute seems to me to be better used 
when you need to translate from one language to only one other 
language. It does not seem fitting for making machine translations 
to hundred of languages. That is: Unless your main purpose is to 
take care of registered trademarks etc.

FINALLY, Karl, it seems to me - and this underlies all I said 
above -- that you have found a usecase for the <NAME> element! For 
instance, perhaps the right thing would be to transLITERATE Word 
in some languages, in some situations? Would translate="no" permit 
that to happen? It seems to me more crucial to give the needed 
info --that it is a name--  so that one can judge, per 
language/translatiion, whether translation/transliteration is needed.

leif halvard silli
Received on Friday, 1 August 2008 02:18:22 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:36 UTC