[Bug 12417] HTML5 is missing attribute for specifying translatability of content

Date: Mon, 28 Nov 2011 14:33:38 +0000
--- Comment #51 from Laurent Romary <laurent.romary@inria.fr> 2011-11-28 14:33:32 UTC ---
As chairperson of ISO committee TC 37/SC 4 (language resource management) the
discussion on a translate attribute for HTML 5 was brought to my attention.
With standards such as
the ISO 24614  family (Word segmentation of written texts),  ISO 24616
(Multilingual Information framework) or ISO 24617 (semantic annotation
framework), we providing a core portfolio of standardized representations for
language resources. Webpages are often an integral part of language resources,
hence we closely follow the standardization
initiatives by the W3C and liaise with them in areas of our expertise,
which includes all fields of language technology such as
internationalization efforts for resources, promoting the creation,
exchange and archiving of resources for multiple languages.  We have a
vast number of uses of a translate attribute, both on the side of
publishing material using the translate attribute but also for using the
translate attribute within semantically rich applications.

It is for instance extremely common that language resources and
publications about them come in multiple languages (e.g. examples are in a
different language). Many applications also involve parallel texts. 
Typically a search engine for interlinear glossed texts could benefit from the
translate tag. The <a
href="http://odin.linguistlist.org/">Online Database of Interlinear
Text</a> is an example for such an application. In such a situation
the translation of the source text would render the page useless, the
translation of the surrounding text might be useful.

Similarly, lexical resources have language specific parts that cannot
be translated without rendering the resource useless.  Though these
resources are not necessarily being natively stored as websites, they
are often transformed and published as such. One example of this is
Wordnet, a frequently used resource in the language technologies. See
for example the original (English) Wordnet at
http://wordnet.princeton.edu/ which also has a web interface. This also
applies to other lexical resources using ISO-24613:2008 (Lexical Markup
Framework): These resources are multilingual by design, but parts could
be translated sensibly (for example definitions), others should never be
translated. In fact, translation has been applied to some resources to
support manual creation processes for other languages.

At the same time large websites with high quality parallel content using
a translate attribute are being looked for for the creation of parallel
resources. However, to use the translate-attribute on webpages for
creating services, providing parallel content, etc. there is a problem:
without a sufficiently large number of available websites using these
attributes, natural language processing applications cannot use them to
analyze and work with the content.

We thus support the introduction of a translate attribute as a standard
attribute in HTML 5, since our resources would heavily make use of this
when we publish them, but also because the NLP services that we implement 
could benefit from its existence.

