- From: Satish S <satish@google.com>
- Date: Tue, 18 Sep 2012 17:24:49 +0100
- To: Adam Sobieski <adamsobieski@hotmail.com>
- Cc: Glen Shires <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
- Message-ID: <CAHZf7RmXKZX7ZhT9Y_MYGZXjJoY5PcF0ZeO9gFvM8hUf1p_gPg@mail.gmail.com>
The JS API being discussed in this CG is far simpler than those and doesn't have any links with HTML elements/markup directly. Based on web developer feedback we should be improving it for more use cases going forward. Cheers Satish On Sun, Sep 16, 2012 at 8:08 AM, Adam Sobieski <adamsobieski@hotmail.com>wrote: > In addition to the <input> element [1] with interface > HTMLInputElement [2], is the <textarea> element [3] with > interface HTMLTextAreaElement [4] and some other possible forms-related > topics [5]. > > [1] > http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html > [2] > http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html#htmlinputelement > [3] > http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-textarea-element > [4] > http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#htmltextareaelement > [5] http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html > > > > Kind regards, > > Adam > > ------------------------------ > From: adamsobieski@hotmail.com > To: gshires@google.com > CC: public-speech-api@w3.org > Date: Sat, 15 Sep 2012 22:05:50 +0000 > Subject: RE: Lexical Context, Speech Recognition and Synthesis > > > In SAPI 5.4, the engine-level interfaces ( > http://msdn.microsoft.com/en-us/library/ee431827(v=vs.85) ) include: > grammar compiler interfaces, resource interfaces, speech recognition > interfaces, speech recognition engine interfaces, and text-to-speech engine > interfaces. Application-level interfaces ( > http://msdn.microsoft.com/en-us/library/ee431816(v=vs.85) ), > however, include: audio interfaces, eventing interfaces, grammar compiler > interfaces, lexicon interfaces, resource interfaces, speech recognition > interfaces, and text-to-speech interfaces. > > Android includes android.speech ( > http://developer.android.com/reference/android/speech/package-summary.html) > and android.speech.tts ( > http://developer.android.com/reference/android/speech/tts/package-summary.html > ). > > The Java Speech API includes javax.speech, javax.speech.recognition, and > javax.speech.synthesis ( > http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/index.html > ). > > User and application lexicons could be viewed, by some developers, as part > of users' configurations and, when programmatically configurable, of use to > providing enhanced features including for dictation scenarios. > > Are JavaScript API's for speech-enhanced <input> elements planned for > discussion upcoming? It seems that both grammar-based and dictation-mode > <input> elements are possible and that, in addition to windows or > documents, speech-enhanced <input> elements could have interfaces for > grammatical, lexical, and other settings. > > > > Kind regards, > > Adam Sobieski > > ------------------------------ > From: gshires@google.com > Date: Fri, 7 Sep 2012 14:33:29 -0700 > To: adamsobieski@hotmail.com > CC: public-speech-api@w3.org > Subject: Re: Lexical Context, Speech Recognition and Synthesis > > If I'm understanding correctly, this is possible with the current spec. > JavaScript code (or for that matter, webserver-side code), can > extract metadata and resources and insert them into the SRGS and SSML. > Such code would provide a highly flexible, powerful and customizable > solution. > > /Glen Shires > > On Fri, Sep 7, 2012 at 2:19 PM, Adam Sobieski <adamsobieski@hotmail.com>wrote: > > Speech API Community Group, > > Greetings. I have some ideas for the JavaScript Speech API pertaining to > lexical context, speech recognition and synthesis. > > One idea pertains to the use of <meta> and <link> elements in HTML5 > documents to indicate metadata and external resources of use to speech > synthesis and recognition components, for example pronunciation lexicons. > Presently, <lexicon> elements can be indicated in SRGS and SSML. > > An example usage scenario is a multimedia forum where users can upload > video content and transcripts or have recipient computers generate such > transcripts. IPA pronunciations as well as pronunciations from other > alphabets can be processed from audio. For interrelated documents, such as > documents in discussion threads, for example scientific discussions with > technical terminology, lexical context data can enhance speech recognition > and synthesis. > > In addition to the aforementioned use of <meta> and <link> elements in > HTML5, such data can also be indicated in document XML. The EPUB3 format, > for example, includes XML attributes for pronunciation. An API topic > includes a means of passing a DOMElement to an interface function for > obtaining such lexical data from XML. > > Another API topic is some sort of Lexicon API so that lexicon data can be > indicated programmatically. While <lexicon> elements can be indicated in > SRGS and SSML, the use of <meta> and <link> and a Lexicon API could enhance > contextual speech synthesis, recognition and dictation. > > > > Kind regards, > > Adam Sobieski > > >
Received on Tuesday, 18 September 2012 16:25:21 UTC