Translation control in HTML5

I wanted to suggest a feature for HTML5 around autotranslation control.  Chris Wendt of Microsoft actually came up with this idea, so I've cc'ed him on this mail.

Situation:
There are a number of online and offline automated translation services, which can be used on any HTML document. These automated services typically translate the text content of all elements and a few selected attributes, such as the title and alt attributes. Examples of online automated translation services: http://translate.google.com, http://babelfish.yahoo.com, http://translator.live.com. All of these services allow user to enter a URL and translate any accessible web page.

Problem:
The translation services translate all elements, including the ones that need to be left untranslated. The document author has no option to control the behavior of the translation service.

Example:
Trying to translate http://en.wikipedia.org/wiki/LINQ to any language translates all the LINQ language keywords, making it impossible to the reader of the translated document to make sense of this article. This problem can also occur when translating quotations or loanwords (http://www.thefreedictionary.com/Loanwords).

sample code
proper names
trademarks, industry standard terms
addresses

Suggested solution:
Google has a <meta> name/value that their translation service respects, but it acts on a document level only:
<meta name="google" value="notranslate">

We believe that the web needs control over translatability at an element level as well.  Therefore we suggest enable author to mark untranslatable elements as such. Automated translation services can then respect the tagging, either deciding not to translate the entire page (e.g. if this were set on <body>) or on individual elements.

In HTML 5, this could be done with a new attribute "translate", valid on all elements. Values "yes" and "no". Default is "yes".  By default attributes are not translatable, alt and title remaining as exceptions. HTML will not introduce new translatable attributes.

The precedence for this feature comes from the ITS (Internationalization Tag Set) in http://www.w3.org/TR/its/, which in section 6.2 specifies an its:translate="no" attribute and a rule for determining non-translatable content in an XML document. This solves the problem for XHTML content, but (obviously) not for HTML.

-Chris

Received on Thursday, 31 July 2008 20:12:48 UTC