[whatwg] Tag Proposal: spelling from Charles Pritchard on 2010-12-12 (public-whatwg-archive@w3.org from December 2010)

From: Charles Pritchard <chuck@jumis.com>
Date: Sat, 11 Dec 2010 18:08:49 -0800
Message-ID: <4D042EB1.9010105@jumis.com>

On 12/11/2010 5:38 PM, whatwg-request at lists.whatwg.org wrote:
> Date: Sun, 12 Dec 2010 00:09:22 -0000
> From: Kornel Lesi?ski<kornel at geekhood.net>
> To:whatwg at lists.whatwg.org
> Subject: Re: [whatwg] Tag Proposal: spelling
> Message-ID:<op.vnkqpwz0te2ec8 at aimac.local>
> Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes
>
> On Sat, 11 Dec 2010 22:23:55 -0000, Charles Pritchard<chuck at jumis.com>
> wrote:
>
>> >  For lack of a better solution, perhaps you can provide an extended
>> >  language tag:
>> >
>> >  <div contenteditable lang="en-GB">
>> >  <span aria-invalid="false" lang="en-GB-x-John-Grey">John Grey</span>  saw
>> >  a...
>> >  </div>
>> >
>> >  The aria attribute could let the spelling software know the string is
>> >  not misspelled, and the lang attribute marks it as an English phrase,
>> >  helpful with transliteration.
>> >
>> >  Does that work?
> Transliteration in language code seems like a hack.
>
> Instead of made-up language code, perhaps one of the special ISO 639 codes
> would be more appropriate?
>
> http://en.wikipedia.org/wiki/ISO_639-2#Special_situations

The en-GB prefix defines the script, and for phonetics, has a little bit 
to say about the region.
The "x-*" prefix is just intended to mark that it's a special dialect of 
en-GB.
That "x-*" could signal special spelling, phonetic or transliteration rules.

Many of the whatwg docs have a language tag of "en-US-x-Hixie", telling 
UAs that the specs are written in an undefined American English dialect.

Consider a multilingual case:

<div contenteditable lang="en-US">
Then he said to me:
<q lang="mul"><span aria-invalid="false" lang="en-GB-x-John-Grey">John Grey</span>  <span lang="es-US">es mi hombre.</span></q>
</div>

That gives enough information to let a translation/transliteration 
service work with the user to make firm decisions.
A user who can not read Roman script, English or Spanish would have the 
"x-*" namespace transliterated, based on
British phonetics, and would translate the "es" and "en" portions based 
on its understanding of US usage of those two languages.

And should entries to the "en-GB-x-John-Grey" space be defined by the 
user, it would translate based on those definitions.

Received on Saturday, 11 December 2010 18:08:49 UTC