Re: Translation control in HTML5

Ian Hickson <ian@hixie.ch> wrote:

> How about a new keyword for "lang", instead, which means "not
> translatable" or some such? lang="computer-code" or something.

lang="zxx".

Jirka Kosek <jirka@kosek.cz> wrote:

> <p lang="en">In Germany it is quite common to clink with glasses  
> before
> drinking and to say <em lang="de" translate="no">Prost!</em> as a  
> toast.</p>
>
> If you will translate this sentence to French, you of course do not  
> want
> to translate "Prost"

<p lang="en">In Germany it is quite common to clink with glasses before
drinking and to say <q lang="de">Prost!</q> as a toast.</p>

With a general recommendation that certain elements (e.g. <q>,  
<code>, etc) not be translated unless the user of the translation  
service has explicitly chosen an option to translate them.

Simon Pieters <simonp@opera.com> wrote:

> <meta name=notranslate content="code, #logo, .term, :lang(de)">

If this were to go ahead, I'd suggest a default value of:

	code,kbd,:lang(zxx)

I think there are really two separate categories of markup which we  
want to deal with here though:

1. Stuff that should not be translated because it's not really  
linguistic content. For example, the contents of <code> or <kbd>;  
taxonomic names.
2. Stuff that should not be translated because it is more useful in  
its original language: book titles, certain quotes, etc.

It is best if we mark up what these things *are* rather than what we  
should *do* with them. That is, we should use elements or attributes  
which indicate *why* they shouldn't be translated, rather than an  
attribute that simply says that they should not be translated. This  
is for the same reason that class="warning" is better than class="big- 
red-bit".

We should be able to deal with the first category without adding any  
additional elements or attributes. The "don't translate <code>, <kbd>  
or :lang(zxx)" rule seems to mostly cover it.

The second category is more difficult as it's something that needs to  
be determined by the page author. (Though of course, the reader  
should be able to over-rule the author.) I'd suggest that the easiest  
way of doing this would be through the lang attribute. I'd suggest  
that any lang attribute matching this pattern:

	/(\-xx(\-|$)|\-x\-notrans(\-|$))/i

be considered an element which should preferably not be translated.  
That is, in RFC 4646 terms, a region code of 'XX' (ISO 3166 private  
use code) or an 'x-notrans' subtag. e.g. the following should not be  
translated:

	de-XX
	de-DE-x-notrans
	de-XX-x-foobar

-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>





-- 
Toby A Inkster
<mailto:mail@tobyinkster.co.uk>
<http://tobyinkster.co.uk>

Received on Friday, 1 August 2008 08:28:55 UTC