W3C home > Mailing lists > Public > public-indie-ui@w3.org > December 2012

Re: IndieUI-ISSUE-8 (TTS context dictionary): Need a way to define a context and pronunciation dictionary on a per-resource basis. [IndieUI: User Context 1.0]

From: James Craig <jcraig@apple.com>
Date: Tue, 04 Dec 2012 11:03:29 -0800
Cc: public-indie-ui@w3.org
Message-id: <655A396A-B4B5-4629-BDB2-5DFDE05C98F1@apple.com>
To: Jason White <jason@jasonjgw.net>
On Dec 3, 2012, at 4:04 PM, Jason White <jason@jasonjgw.net> wrote:

> James wrote:
> 
>> It's possible this should use a phonemic alphabet for the pronunciation strings, but I'm not sure about the l18n implications of that decision at the time of this writing.
> 
> If I recall an undergraduate linguistics course correctly, the international
> phonetic alphabet is applicable to a great variety of languages.

Interesting. 

> If IPA is
> included in Unicode, then it could perhaps be used to specify pronunciations,
> which could then be converted to the phonemic string used by the particular
> TTS software in use on the client side.
> 
> This approach would also have the advantage that the value need only be
> specified as an Unicode string and authors could use either phonetics or
> substitute text as appropriate.


We'd need to differentiate somehow. As letter-only phonetics (especially homophones like 'get' or 'git', or homonyms like 'read' and 'read') are not sufficiently explicit to differentiate whether they are defined as a word, or as the phonetic representation. This difference would vary more significantly when considering pronunciations in languages other than English.

From Wikipedia [1]:
>> IPA symbols are composed of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter ⟨t⟩ may be transcribed in IPA with a single letter, [t], or with a letter plus diacritics, [t̺ʰ], depending on how precise one wishes to be. Often, slashes are used to signal broad orphonemic transcription; thus, /t/ is less specific than, and could refer to, either [t̺ʰ] or [t] depending on the context and language.

Delimiting by slashes is reserved in JavaScript for regular expressions, but we could use the standard slashes inside a quoted string ("/foo/"). 

window.tts.phonetics['Phitt'] = "fit"; // word as homonym in currently used language
window.tts.phonetics['feat'] = "/fit/"; // phonetic representation in IPA

We could also potentially start the string with a reserved control character (e.g. "~fit") but I like the quoted slashes idea better.

Thoughts?
James


1. http://en.wikipedia.org/wiki/International_Phonetic_Alphabet
Received on Tuesday, 4 December 2012 19:03:58 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 4 December 2012 19:03:59 GMT