W3C home > Mailing lists > Public > www-international@w3.org > July to September 2008

Re: Original vs. tranlsation content negotiation

From: Leif Halvard Silli <lhs@malform.no>
Date: Sun, 03 Aug 2008 00:37:09 +0200
Message-ID: <4894E195.2060305@malform.no>
To: "Phillips, Addison" <addison@amazon.com>
CC: "www-international@w3.org" <www-international@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>

Phillips, Addison 2008-08-01 06.51:

> (personal response)
> 
> First: I really wish that the HTML WG would ask the I18N WG
> stuff directly rather than *wondering* about what the WG
> "thinks". I'm glad to see this note, but wish there weren't a
> dozen messages in the archive wondering what the I18N people
> were up to :-).


Right, there were some such comments in other threads in the 
HTMLwg, I think.

 
> I tend to agree with the idea that additional markup is wanted
> for this purpose. I'm not sure that this markup belongs in
> HTML5, though. I especially disagree with Hixie's comment in
> [1]. Changing the syntax of "lang", especially in a way
> incompatible with existing usage (just BCP 47 language tags and
> nothing else) would be deeply harmful and incompatible. For
> example, how would this play with the nascent ability to use
> the CSS 2.1 :lang pseudo-attribute?!? 


We agree in that the lang attribute should not be "messed up".

> Language tags should,
> IMHO, do their job as language tags and not do double or triple
> duty as translation identifiers, etc.


However, when it comes to the content of the lang attribute vs. 
"translation identifiers", then I must mention the current draft 
of RFC 4646, which I also quoted in my reply to the HTMLwg:

"Extensions [...] are intended to identify information which is
commonly used in association with languages or language tags, but
which is not part of language identification."

I do find that info about the translate-ability of a certain bit 
of text should qualify as "information which is cmmonly used in 
association with languages or language tags".

Secondly, as is also more or less evident from my reply to the 
HTMLwg, I am open for a double solution: I think addding info by 
expanding the language tag can have a role, and I think a 
translate="yes/no" can play a (different) role.

The identification role is just one of the roles that the lang 
attribute has ... Whether the global ITS selector, or the possible 
META element "selector" in HTML  [see below] is limited to use 
:lang(de) or it can also use :lang(de-q-notrans) is not a 
principal difference.

This said, it might be that direct "do-not-translate" info should 
not go into  the language tags. But that the focus rather should 
be in using the lang attribute in selectors. One inspiration for 
the language tag extension was Simon Pieter's proposal to use the 
META element to "cascade" info on what bits of the document is 
translatable or not [1]:

<meta name=notranslate content="code, #logo, .term, :lang(de)">

In the above example, however, one excludes all German text from 
being translated. It might be needed - and better - to be able to 
single out only such German text that are marked with a 
"de-q-notrans" or similar.

As I pointed out for the HTMLwg, the registering of an extension 
opens up for wider use. For example content negotiation.

One thing I had in mind in that regard was Martin's expression on 
this list (www-international) in april/may of method for - in Web 
browser - saying that you prefer originals over translations.

> As you note, if you want to add some special gunk to a language
> tag, you should use private-use or you should register an
> extension (see RFC 4646 for details).


I guess HTML should not resort to private-use, though, but 
eventualy register an extension.

> However, I don't think that an extension makes a lot of sense
> here. This "single-note" extension strikes me as difficult to
> work with and adds little value--while complicating matching of
> language tags, language negotiation, etc.

What do you mean by "difficult to work with"? Do you mean that 
*extensions* in general are difficult ot work/author with?

Using extensions does not "complicate mathcing of language tags", 
at least not within HTML/CSS. Selector:lang(de) will match both 
lang=de and lang=de-q-original.

Perhaps it complicates language negotiation. But I wonder how? if 
a document is served as 'en-q-original' then both those UAs asking 
for 'en' and 'en-q-original' would get it.

> I see the thread has considered the ITS tag set [2]. It already
> defines elements/attributes for indicating translatability.
> There is no reason I can think of to invent a new syntax for
> this. Admittedly, the syntax you propose matches ITS to some
> extent, but you should reference ITS for this rather than have
> something "similar". That's why it exists. I find other's
> desire not to allow namespaces mystifying.


I think I must leave the namespaces issue uncommented.

However, I wonder: with the right keywords for a such possible 
language tag extension, should it not also increase the usefulness 
of the global selector in ITS?

I think you should view my proposal about an extension, not as a 
replacement of translate="" and <meta name=notranslate ... > (or a 
similar way to global selecting method, such as the one in ITS) 
but as a supplement. Something which would allow for a more 
precise selection of elements not to be translated. (You could say 
that my thought has developed a little here.)

 
> I hope that helps as a starter. I'm sure the WG will have
> something or other to say :-).


Yes, many thanks for your reply!

But for the record, it turns out that I was only the second poster 
to propose using language tags for giving translation info. The 
first were Toby Inkster, though I would not say his proposal is 
identical with mine. [2]

Btw, he talks about the "region code of 'XX' (ISO 3166 private use 
code)". However, I am unable to find the region code 'XX' inside 
the BCP 47 recommendation/drafts.

Leif Halvard Silli

[1] http://lists.w3.org/Archives/Public/public-html/2008Jul/0442.html
[2] http://lists.w3.org/Archives/Public/public-html/2008Aug/0011.html
Received on Saturday, 2 August 2008 22:37:56 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:18 GMT