Lexical Context, Speech Recognition and Synthesis from Adam Sobieski on 2012-09-07 (public-speech-api@w3.org from September 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Fri, 7 Sep 2012 21:19:29 +0000
To: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <SNT002-W103FAE2419ECDD150FD1396C5AF0@phx.gbl>

Speech API Community Group,

Greetings. I have some ideas for the JavaScript Speech API pertaining to lexical context, speech recognition and synthesis.

One idea pertains to the use of <meta> and <link> elements in HTML5 documents to indicate metadata and external resources of use to speech synthesis and recognition components, for example pronunciation lexicons. Presently, <lexicon> elements can be indicated in SRGS and SSML.
An example usage scenario is a multimedia forum where users can upload video content and transcripts or have recipient computers generate such transcripts. IPA pronunciations as well as pronunciations from other alphabets can be processed from audio. For interrelated documents, such as documents in discussion threads, for example scientific discussions with technical terminology, lexical context data can enhance speech recognition and synthesis. In addition to the aforementioned use of <meta> and <link> elements in HTML5, such data can also be indicated in document XML. The EPUB3 format, for example, includes XML attributes for pronunciation. An API topic includes a means of passing a DOMElement to an interface function for obtaining such lexical data from XML.
Another API topic is some sort of Lexicon API so that lexicon data can be indicated programmatically. While <lexicon> elements can be indicated in SRGS and SSML, the use of <meta> and <link> and a Lexicon API could enhance contextual speech synthesis, recognition and dictation.

Kind regards,

Adam Sobieski

Received on Friday, 7 September 2012 21:19:56 UTC