[whatwg] Considering a lang- attribute prefix for machine translation and intelligibility

On Wed, May 2, 2012 at 9:59 AM, Charles Pritchard <chuck at jumis.com> wrote:
> There has been some discussion on the w3c/whatwg mailing lists about how far
> we can mark up content with linguistic tags, such as marking word and/or
> sentence boundaries.
>
> In my authoring of web apps, I often write a short manual into a hidden div,
> so that the vocabulary of my application can be processed by translation
> services such as Google translate. Having content in the DOM seems the most
> appropriate way to handle translation.
>
> I'd like the group to consider the costs/benefits/alternatives to a "lang-"
> attribute.
> Such as <span lang-role="sentence">This is a sentence.</span>
>
> The data- and aria- attributes have worked out well. We may want to make
> room for one more.
>
> Such a structure could be used to markup typical subject/object/verb and
> clause sections; it could also be used to markup poetic texts as well as
> defined meanings of content.
>
> http://www.omegawiki.org/Expression:orange
> This is an <span lang-meaning="DefinedMeaning:orange_(5821)">orange</span>.
> Now this, this is <span
> lang-meaning="DefinedMeaning:orange_(5822)">orange</span>.
>
> In most cases there's no need to define sentence boundary, meaning or
> otherwise. But, it'd sure be nice to have the ability to do so in a standard
> manner.
>
> I'd recommend role, meaning and prosody/pronunciation as the primary
> targets. Character markup may be something to consider as it's come up in
> SVG (rotate) and in CSS before. Doing a span for each character is not
> practical, so we'd want a shorthand much as SVG has shorthand for rotate.

Do you expect outside services to do anything useful with this
information?  If not, the data-* attributes seem appropriate.

If you do expect that, have you evaluated the existing mechanisms for
embedding custom data in the page and found them wanting? If so, how?

~TJ

Received on Wednesday, 2 May 2012 10:50:42 UTC