Re: Lexical Context, Speech Recognition and Synthesis from Satish S on 2012-09-18 (public-speech-api@w3.org from September 2012)

From: Satish S <satish@google.com>
Date: Tue, 18 Sep 2012 17:24:49 +0100
To: Adam Sobieski <adamsobieski@hotmail.com>
Cc: Glen Shires <gshires@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>
Message-ID: <CAHZf7RmXKZX7ZhT9Y_MYGZXjJoY5PcF0ZeO9gFvM8hUf1p_gPg@mail.gmail.com>
The JS API being discussed in this CG is far simpler than those and doesn't
have any links with HTML elements/markup directly. Based on web developer
feedback we should be improving it for more use cases going forward.

Cheers
Satish


On Sun, Sep 16, 2012 at 8:08 AM, Adam Sobieski <adamsobieski@hotmail.com>wrote:

> In addition to the <input> element [1] with interface
> HTMLInputElement [2], is the <textarea> element [3] with
> interface HTMLTextAreaElement [4] and some other possible forms-related
> topics [5].
>
> [1]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html
> [2]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-input-element.html#htmlinputelement
> [3]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#the-textarea-element
> [4]
> http://www.whatwg.org/specs/web-apps/current-work/multipage/the-button-element.html#htmltextareaelement
> [5] http://www.whatwg.org/specs/web-apps/current-work/multipage/forms.html
>
>
>
> Kind regards,
>
> Adam
>
> ------------------------------
> From: adamsobieski@hotmail.com
> To: gshires@google.com
> CC: public-speech-api@w3.org
> Date: Sat, 15 Sep 2012 22:05:50 +0000
> Subject: RE: Lexical Context, Speech Recognition and Synthesis
>
>
>    In SAPI 5.4, the engine-level interfaces (
> http://msdn.microsoft.com/en-us/library/ee431827(v=vs.85) ) include:
> grammar compiler interfaces, resource interfaces, speech recognition
> interfaces, speech recognition engine interfaces, and text-to-speech engine
> interfaces.  Application-level interfaces (
> http://msdn.microsoft.com/en-us/library/ee431816(v=vs.85) ),
> however, include: audio interfaces, eventing interfaces, grammar compiler
> interfaces, lexicon interfaces, resource interfaces, speech recognition
> interfaces, and text-to-speech interfaces.
>
> Android includes android.speech (
> http://developer.android.com/reference/android/speech/package-summary.html)
> and android.speech.tts (
> http://developer.android.com/reference/android/speech/tts/package-summary.html
> ).
>
> The Java Speech API includes javax.speech, javax.speech.recognition, and
> javax.speech.synthesis (
> http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/index.html
> ).
>
> User and application lexicons could be viewed, by some developers, as part
> of users' configurations and, when programmatically configurable, of use to
> providing enhanced features including for dictation scenarios.
>
> Are JavaScript API's for speech-enhanced <input> elements planned for
> discussion upcoming?  It seems that both grammar-based and dictation-mode
> <input> elements are possible and that, in addition to windows or
> documents, speech-enhanced <input> elements could have interfaces for
> grammatical, lexical, and other settings.
>
>
>
> Kind regards,
>
> Adam Sobieski
>
> ------------------------------
> From: gshires@google.com
> Date: Fri, 7 Sep 2012 14:33:29 -0700
> To: adamsobieski@hotmail.com
> CC: public-speech-api@w3.org
> Subject: Re: Lexical Context, Speech Recognition and Synthesis
>
> If I'm understanding correctly, this is possible with the current spec.
>  JavaScript code (or for that matter, webserver-side code), can
> extract metadata and resources and insert them into the SRGS and SSML.
>  Such code would provide a highly flexible, powerful and customizable
> solution.
>
> /Glen Shires
>
> On Fri, Sep 7, 2012 at 2:19 PM, Adam Sobieski <adamsobieski@hotmail.com>wrote:
>
> Speech API Community Group,
>
> Greetings. I have some ideas for the JavaScript Speech API pertaining to
> lexical context, speech recognition and synthesis.
>
> One idea pertains to the use of <meta> and <link> elements in HTML5
> documents to indicate metadata and external resources of use to speech
> synthesis and recognition components, for example pronunciation lexicons.
> Presently, <lexicon> elements can be indicated in SRGS and SSML.
>
> An example usage scenario is a multimedia forum where users can upload
> video content and transcripts or have recipient computers generate such
> transcripts. IPA pronunciations as well as pronunciations from other
> alphabets can be processed from audio. For interrelated documents, such as
> documents in discussion threads, for example scientific discussions with
> technical terminology, lexical context data can enhance speech recognition
> and synthesis.
>
> In addition to the aforementioned use of <meta> and <link> elements in
> HTML5, such data can also be indicated in document XML. The EPUB3 format,
> for example, includes XML attributes for pronunciation. An API topic
> includes a means of passing a DOMElement to an interface function for
> obtaining such lexical data from XML.
>
> Another API topic is some sort of Lexicon API so that lexicon data can be
> indicated programmatically. While <lexicon> elements can be indicated in
> SRGS and SSML, the use of <meta> and <link> and a Lexicon API could enhance
> contextual speech synthesis, recognition and dictation.
>
>
>
> Kind regards,
>
> Adam Sobieski
>
>
>
Received on Tuesday, 18 September 2012 16:25:21 UTC