[whatwg] Considering a lang- attribute prefix for machine translation and intelligibility

There has been some discussion on the w3c/whatwg mailing lists about how 
far we can mark up content with linguistic tags, such as marking word 
and/or sentence boundaries.

In my authoring of web apps, I often write a short manual into a hidden 
div, so that the vocabulary of my application can be processed by 
translation services such as Google translate. Having content in the DOM 
seems the most appropriate way to handle translation.

I'd like the group to consider the costs/benefits/alternatives to a 
"lang-" attribute.
Such as <span lang-role="sentence">This is a sentence.</span>

The data- and aria- attributes have worked out well. We may want to make 
room for one more.

Such a structure could be used to markup typical subject/object/verb and 
clause sections; it could also be used to markup poetic texts as well as 
defined meanings of content.

http://www.omegawiki.org/Expression:orange
This is an <span lang-meaning="DefinedMeaning:orange_(5821)">orange</span>.
Now this, this is <span 
lang-meaning="DefinedMeaning:orange_(5822)">orange</span>.

In most cases there's no need to define sentence boundary, meaning or 
otherwise. But, it'd sure be nice to have the ability to do so in a 
standard manner.

I'd recommend role, meaning and prosody/pronunciation as the primary 
targets. Character markup may be something to consider as it's come up 
in SVG (rotate) and in CSS before. Doing a span for each character is 
not practical, so we'd want a shorthand much as SVG has shorthand for 
rotate.

-Charles

Received on Wednesday, 2 May 2012 09:59:36 UTC