RE: Lexical Context, Speech Recognition and Synthesis from Adam Sobieski on 2012-09-15 (public-speech-api@w3.org from September 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Sat, 15 Sep 2012 22:05:50 +0000
To: Glen Shires <gshires@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <SNT002-W821C57DA6AD16CF627FBDEC5970@phx.gbl>

Date: Fri, 7 Sep 2012 14:33:29 -0700
To: adamsobieski@hotmail.com
CC: public-speech-api@w3.org
Subject: Re: Lexical Context, Speech Recognition and Synthesis

If I'm understanding correctly, this is possible with the current spec.  JavaScript code (or for that matter, webserver-side code), can extract metadata and resources and insert them into the SRGS and SSML.  Such code would provide a highly flexible, powerful and customizable solution.

/Glen Shires

On Fri, Sep 7, 2012 at 2:19 PM, Adam Sobieski <adamsobieski@hotmail.com> wrote:

Speech API Community Group,

Greetings. I have some ideas for the JavaScript Speech API pertaining to lexical context, speech recognition and synthesis.

One idea pertains to the use of <meta> and <link> elements in HTML5 documents to indicate metadata and external resources of use to speech synthesis and recognition components, for example pronunciation lexicons. Presently, <lexicon> elements can be indicated in SRGS and SSML.

An example usage scenario is a multimedia forum where users can upload video content and transcripts or have recipient computers generate such transcripts. IPA pronunciations as well as pronunciations from other alphabets can be processed from audio. For interrelated documents, such as documents in discussion threads, for example scientific discussions with technical terminology, lexical context data can enhance speech recognition and synthesis.

In addition to the aforementioned use of <meta> and <link> elements in HTML5, such data can also be indicated in document XML. The EPUB3 format, for example, includes XML attributes for pronunciation. An API topic includes a means of passing a DOMElement to an interface function for obtaining such lexical data from XML.

Another API topic is some sort of Lexicon API so that lexicon data can be indicated programmatically. While <lexicon> elements can be indicated in SRGS and SSML, the use of <meta> and <link> and a Lexicon API could enhance contextual speech synthesis, recognition and dictation.

Kind regards,

Adam Sobieski

Received on Saturday, 15 September 2012 22:06:19 UTC