W3C home > Mailing lists > Public > whatwg@whatwg.org > May 2012

[whatwg] Considering a lang- attribute prefix for machine translation and intelligibility

From: Charles Pritchard <chuck@jumis.com>
Date: Wed, 02 May 2012 12:01:14 -0700
Message-ID: <4FA1847A.20702@jumis.com>
On 5/2/12 11:46 AM, Benjamin Hawkes-Lewis wrote:
> On Wed, May 2, 2012 at 6:59 PM, Charles Pritchard<chuck at jumis.com>  wrote:
>>> If you do expect that, have you evaluated the existing mechanisms for
>>> embedding custom data in the page and found them wanting? If so, how?
>> 1. New features won't fix Google Translate bugs with existing
>> features, and it's more efficient for Google to fix Translate than for
>> the community to design, specify, and implement new features.

New features do allow services to coalesce around standards. That's what 
the standards are here for.
HTML5 just added a translate attribute.

Span does not in and of itself signify any semantic meaning. Doesn't 
that mean that Google Translate is operating correctly?

>> 2, 3, and 4: Given an appropriate vocabulary, existing mechanisms can
>> encode unambiguous meanings, information about how text should be
>> spoken, and phrase and sentence boundaries. Unicode describes
>> character boundaries.

Boris brought up that the concept of letter could use some attention:
http://lists.w3.org/Archives/Public/www-style/2011Nov/0055.html

Yes, we have existing XML mechanisms for text should be spoken.

What existing mechanism do we have for disambiguation?

>>
>> 5. Tab isn't talking about "data-" here, but about all the various
>> mechanisms available to provide custom data for services to consume
>> (e.g. microdata, microformats, RDFa).

Tab asked directly why data- does not work

Yes, we have a lot of microformats, it's true. And RDFa.

They don't seem to be taking flight for these issues, and language 
translation seems like a high level issue appropriate for HTML. Again, 
look at the translate and lang attributes; those are baked into HTML.

I am approaching the "lang-" proposal as language agnostic, much as 
"aria-" is language agnostic.

This seems to be where we are currently:
<img lang="es" translate="no" alt="No" />

With alt having ARIA counterparts.

I'm suggesting a "lang-" with counterparts to translate, language code, 
and a vastly enhanced vocabulary, much as ARIA vastly enhanced the UI 
vocabulary. I think it could help in the long run.

-Charles
Received on Wednesday, 2 May 2012 12:01:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 30 January 2013 18:48:08 GMT