W3C home > Mailing lists > Public > public-i18n-core@w3.org > July to September 2011

[Bug 12417] HTML5 is missing attribute for specifying translatability of content

From: <bugzilla@jessica.w3.org>
Date: Fri, 29 Jul 2011 09:50:40 +0000
To: public-i18n-core@w3.org
Message-Id: <E1QmjiK-0006MP-9d@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12417

Jörg Schütz <joerg@bioloom.de> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |joerg@bioloom.de

--- Comment #32 from Jörg Schütz <joerg@bioloom.de> 2011-07-29 09:50:36 UTC ---
This is a very interesting thread. The request for an additional markup element
or just a new attribute/value pair is an important issue for the global
multilingual web. This issue is related to the possible consumption process of
information encoded in HTML, and here we have to distinguish, for example, the
following three use cases: (1) (human) user wants a translation into her
language, (2) NLP application (searching, trawling, analyzing) wants to provide
multilingual results, and (3) integration into a localization and translation
process chain. In a first approximation, the introduction of a new HTML5
language element sounds feasible and appropriate. However, this might end up
with additional requests for the markup of terminology, sentence boundaries,
semantic constructs, etc. which are all legitimate demands with convincing use
cases, i.e. to effectively guide (machine) translation applications and to
enhance the output quality of these applications. We already had the elements
"acronym" and "abbrev" in HTML 4, and now in HTML5 only "abbr" has survived. So
for me it is not a good idea to just introduce new syntactic sugar.

Let's analyse a bit more the possible use cases regarding what HTML5 already
has on board as a potential solution, and also let's bear in mind that HTML5 is
about web technologies and accessibility (see "wai-aria") which to some extend
is included in the above translation scenario requirements.

One solution is with styling, for example: <p class="translatable"
lang="en-US">...<b class="term">semantic styling</b>...</p>. This solution was
already proposed in this thread, and it seems not optimal for our intended
application scenario because it may have side effects with traditional css
styling apporaches.

Another possibility is with microdata, for example: constructs with
"itemscope", "itemprop", "itemref" and "itemid" including itemtype attributes,
and the use of existing (or new) microdata vocabularies. This approach is
pretty much inline with the discussions of a semantic HTML5, and is backed by
the use cases above.

In summary, it turns out that we need to establish some "best practices" for
specifying the translatability of content, and that web translation
applications should be guided through them. Therefore, I suggest that we also
discuss a possible microdata approach. I am looking forward to your opinions.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
Received on Friday, 29 July 2011 09:50:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 29 July 2011 09:50:43 GMT