RE: Lexical Context, Speech Recognition and Synthesis from Adam Sobieski on 2012-09-16 (public-speech-api@w3.org from September 2012)

From: Adam Sobieski <adamsobieski@hotmail.com>
Date: Sun, 16 Sep 2012 07:08:02 +0000
To: Glen Shires <gshires@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <SNT002-W195A2C545BADEE837FC21E2C5960@phx.gbl>
In addition to the <input> element [1] with interface HTMLInputElement [2], is the <textarea> element [3] with interface HTMLTextAreaElement [4] and some other possible forms-related topics [5]. [1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html[2] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html#htmlinputelement[3] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-textarea-element[4] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#htmltextareaelement[5] http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html   Kind regards, Adam From: adamsobieski@hotmail.com
To: gshires@google.com
CC: public-speech-api@w3.org
Date: Sat, 15 Sep 2012 22:05:50 +0000
Subject: RE: Lexical Context, Speech Recognition and Synthesis



















In SAPI 5.4, the engine-level interfaces ( http://msdn.microsoft.com/en-us/library/ee431827(v=vs.85) ) include: grammar compiler interfaces, resource interfaces, speech recognition interfaces, speech recognition engine interfaces, and text-to-speech engine interfaces.  Application-level interfaces ( http://msdn.microsoft.com/en-us/library/ee431816(v=vs.85) ), however, include: audio interfaces, eventing interfaces, grammar compiler interfaces, lexicon interfaces, resource interfaces, speech recognition interfaces, and text-to-speech interfaces. Android includes android.speech (http://developer.android.com/reference/android/speech/package-summary.html) and android.speech.tts (http://developer.android.com/reference/android/speech/tts/package-summary.html). The Java Speech API includes javax.speech, javax.speech.recognition, and javax.speech.synthesis (http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/index.html). User and application lexicons could be viewed, by some developers, as part of users' configurations and, when programmatically configurable, of use to providing enhanced features including for dictation scenarios. Are JavaScript API's for speech-enhanced <input> elements planned for discussion upcoming?  It seems that both grammar-based and dictation-mode <input> elements are possible and that, in addition to windows or documents, speech-enhanced <input> elements could have interfaces for grammatical, lexical, and other settings.   Kind regards, Adam Sobieski  From: gshires@google.com
Date: Fri, 7 Sep 2012 14:33:29 -0700
To: adamsobieski@hotmail.com
CC: public-speech-api@w3.org
Subject: Re: Lexical Context, Speech Recognition and Synthesis

If I'm understanding correctly, this is possible with the current spec.  JavaScript code (or for that matter, webserver-side code), can extract metadata and resources and insert them into the SRGS and SSML.  Such code would provide a highly flexible, powerful and customizable solution.


/Glen Shires

On Fri, Sep 7, 2012 at 2:19 PM, Adam Sobieski <adamsobieski@hotmail.com> wrote:





Speech API Community Group,
 
Greetings. I have some ideas for the JavaScript Speech API pertaining to lexical context, speech recognition and synthesis.
 
One idea pertains to the use of <meta> and <link> elements in HTML5 documents to indicate metadata and external resources of use to speech synthesis and recognition components, for example pronunciation lexicons. Presently, <lexicon> elements can be indicated in SRGS and SSML.


 
An example usage scenario is a multimedia forum where users can upload video content and transcripts or have recipient computers generate such transcripts. IPA pronunciations as well as pronunciations from other alphabets can be processed from audio. For interrelated documents, such as documents in discussion threads, for example scientific discussions with technical terminology, lexical context data can enhance speech recognition and synthesis.


 
In addition to the aforementioned use of <meta> and <link> elements in HTML5, such data can also be indicated in document XML. The EPUB3 format, for example, includes XML attributes for pronunciation. An API topic includes a means of passing a DOMElement to an interface function for obtaining such lexical data from XML.


 
Another API topic is some sort of Lexicon API so that lexicon data can be indicated programmatically. While <lexicon> elements can be indicated in SRGS and SSML, the use of <meta> and <link> and a Lexicon API could enhance contextual speech synthesis, recognition and dictation.



 
 
Kind regards,
 
Adam Sobieski
Received on Sunday, 16 September 2012 07:08:29 UTC