Re: Translation control in HTML5

On Sat, Aug 2, 2008 at 1:29 AM, Ian Hickson <ian@hixie.ch> wrote:
> On Fri, 1 Aug 2008, Simon Pieters wrote:
>> Wouldn't :lang(en-US) match <span lang="only en-US">?
>
> Yeah, but it would break use of the |= attribute selector.

Hmm, sounds like a serious problem if you ask me.  Serious enough to
reject the "[only] <bcp-47-code>" idea?  Anyhow, I think it is time to
sum up all the proposals made and discuss their respective advantages
and disadvantages.  Let me try and give a concise overview, including
some subjective comments:

– Chris Wendt and Chris Wilson suggested [1] a global @translate which
is compatible with ITS [2].  In order to avoid redundancies, if we
introduce @translate, we should only allow it in the HTML5
serialisation – just like @lang – and encourage XHTML5 authors to use
@its:translate.  Personally, I like this idea since it offers authors
a very quick and simple way to mark up content that should not be
translated – and we all know that simplicity is one of the key
prerequisites for the success of a solution.

– Ian suggested "a new keyword for 'lang', instead, which means 'not
translatable' or some such" [3].  Korel taught us [4] that ISO 639-2
already defines the "zxx" and "und" values for similar purposes. [5]
However, they are far away from covering all our requirements.

– Later, Ian suggested [6] a notation like lang="[only]
<bcp-47-code>".  It would work with existing translation tools that
already implement @lang correctly – while the popular Web translators
do not seem to fall into this category.  Cons:  As mentioned above,
this notation would break the use of the |= attribute selector.
Furthermore, we can redefine @lang indeed, but @lang would be
incompatible with @xml:lang then.  These difficulties have finally
made me dislike this idea.

– Leif came up with the idea to register language tag extensions with
IANA instead [7].  They could look like "en-q-notTranslate" or
"en-q-name".  An additional benefit of this solution would be that
language tag extensions could be reused for other purposes:  <link
rel="alternate" lang="fr-q-original" href="text.html.fr"> tells the
user that the French version of the text is the original one.  I like
this idea since, for example, <span lang="de-q-name">Daniel
Schwarz</span> does not only show _that_ this span must not be
translated to "Daniel Black" in English, but also _why_ it must not.
Toby elaborated on this issue. [8]

– Thus, several people raised the question whether there are some
elements such as <code> and <kbd> that should not be translated by
default. [9]  Whether such a convention makes sense depends on how
often these elements are misused on the Web, as Simon pointed out.
[10]  Possibly, some statistic data could help us out?  If it does not
break too much legacy content, I would strongly suggest defining clear
rules here.  This would save authors of technical documents from
always having to specify <meta name="notranslate" content="code, kbd,
…"> (see below) or even <code translate="no">.

– For page-wide translation rules, Simon suggested <meta
name="notranslate"> [11].  In its @content, CSS selectors specify
which elements are not to be translated.  Not universal enough to
solve all our problems elegantly, I guess, but quite a useful idea
anyway.  (I do not consider specifying <meta name="notranslate"
content=".notranslate"> on every page elegant.)

– Dave is of the opinion that "auto-translate systems should be more
careful, and only translate text that's in the overall language of the
page, into the target, and not the 'call-outs' that are in a different
language." [12]  I am still not sure if this is workable and would
like to wait for feedback of other people here that I can take into
consideration before finally taking a stand on this idea myself :-)

To conclude, I think that @translate, language tag extensions, <meta
name="notranslate"> and defaulting <code>, <kbd> etc. to
translate="no" could all make sense parallely.  Do you think it is
possible to reconcile all these solutions in HTML 5?  Probabely too
many redundancies, hm?  What do you think, which solutions should we
finally pick?  (You see, I am of the opinion that careful authors
should definitely have the possibility to mark up content that should
not be translated.)

-david

[1] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0427.html>
[2] <http://www.w3.org/TR/its/>
[3] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0428.html>
[4] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0432.html>
[5] <http://www.w3.org/International/questions/qa-no-language#nonlinguistic>
[6] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0431.html>
[7] <http://lists.w3.org/Archives/Public/public-html/2008Aug/0005.html>
[8] <http://lists.w3.org/Archives/Public/public-html/2008Aug/0011.html>
[9] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0446.html>
[10] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0433.html>
[11] <http://lists.w3.org/Archives/Public/public-html/2008Jul/0443.html>
[12] <http://lists.w3.org/Archives/Public/public-html/2008Aug/0003.html>

Received on Saturday, 2 August 2008 14:58:18 UTC