W3C home > Mailing lists > Public > public-speech-api-contrib@w3.org > April 2012

RE: Speech API: first editor's draft posted

From: Young, Milan <Milan.Young@nuance.com>
Date: Tue, 17 Apr 2012 20:41:18 +0000
To: Adam Sobieski <adamsobieski@hotmail.com>
CC: "hwennborg@google.com" <hwennborg@google.com>, "public-speech-api@w3.org" <public-speech-api@w3.org>, "public-speech-api-contrib@w3.org" <public-speech-api-contrib@w3.org>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C1A4555B5@SOM-EXCH04.nuance.com>
The use case for dynamic grammars is a good one, but I still worry about the complexity of defining an inline grammar API.

This issue came up during the SpechXG discussions.   We decided to go with some sort of HTML syntax that allowed for the creation of a inline grammar using a URL format.  Hopefully someone with more HTML experience than I can figure out what that means.  Ollie?

Thanks


From: Adam Sobieski [mailto:adamsobieski@hotmail.com]
Sent: Tuesday, April 17, 2012 10:42 AM
To: Young, Milan
Cc: hwennborg@google.com; public-speech-api@w3.org; public-speech-api-contrib@w3.org
Subject: RE: Speech API: first editor's draft posted

Speech API Community Group,
Milan Young,

Some ideas for extending SRGS include specifying JavaScript events on DOM elements in grammars, as a hypothesis towards the multimodal examples indicated in Section 4 of the HTML Speech Incubator Group Final Report (http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#use-cases):

<item event="html5:click" target="...#element" before="0.5s" after="0.5s" />
or
<event type="html5:click" target="...#element" before="0.5s" after="0.5s" />

Also, while SISR includes XML output scenarios (http://www.w3.org/TR/semantic-interpretation/#SI7, http://www.w3.org/TR/semantic-interpretation/#SI7.1, http://www.w3.org/TR/semantic-interpretation/#SI7.2, http://www.w3.org/TR/semantic-interpretation/#SI7.3), it is possible that SRGS and SSML could be enhanced for some specific emergent scenarios.  SSML's <prosody> topics are interesting.

I also think that "SpeechRecognizer" and "SpeechSynthesizer" sound good for interface names.

By runtime dynamic grammars, I mean those created in the browser, programmatically, for example with JavaScript.  .NET, Java and other speech API's support dynamic grammars; some developers might be accustomed to creating grammars programmatically.  An example use case is a website with a GUI component for displaying items, possibly using XHR or MXHR, and then the page can use JavaScript to create or update a grammar for speech functionality.  In that example use case, the items can each have annotative natural language data, such a website may even be multilingual, and the web application can utilize dynamicism with regard to one or more grammars for voice-based navigation and features.



Kind regards,

Adam
________________________________
From: Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>
To: adamsobieski@hotmail.com<mailto:adamsobieski@hotmail.com>; hwennborg@google.com<mailto:hwennborg@google.com>
CC: public-speech-api@w3.org<mailto:public-speech-api@w3.org>; public-speech-api-contrib@w3.org<mailto:public-speech-api-contrib@w3.org>
Subject: RE: Speech API: first editor's draft posted
Date: Tue, 17 Apr 2012 16:55:56 +0000
The Voice Browser Working Group (http://www.w3.org/Voice/) defines the SRGS and SSML standards.  I suggest that any extensions on incorporating mathematical statements into a speech application are routed through that group.

I agree that the TTS name needs work, and the names below seem like a good candidates.  I prefer "SpeechSynthesizer" because it fits well with Recognizer.

When you mention "runtime dynamic grammars", do you mean those created within the browser?  Since I suspect that will complicate the API/implementation, I'd like to hear the use case.

Thanks


From: Adam Sobieski [mailto:adamsobieski@hotmail.com]<mailto:[mailto:adamsobieski@hotmail.com]>
Sent: Saturday, April 14, 2012 8:30 AM
To: hwennborg@google.com<mailto:hwennborg@google.com>
Cc: public-speech-api@w3.org<mailto:public-speech-api@w3.org>; public-speech-api-contrib@w3.org<mailto:public-speech-api-contrib@w3.org>
Subject: RE: Speech API: first editor's draft posted

Speech API Community Group,

Greetings.  Regarding http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html, I wanted to also provide some comments and suggestions for discussion:

(1) The interface 'TTS' can be refactored to 'SpeechSynthesis', 'SpeechSynthesizer' or 'SpeechSynth'.
(2) The synthesis interface can include, in addition to text string input, XML string input and document element input for HTML5 and SSML.
(3) During the synthesis of document element inputs, UA's can process substructural elements, as they are synthesized, with options resembling http://wam.inrialpes.fr/timesheets/docs/timeAction.html .
(4) For XML string and document element input formats, PLS references, CSS speech styling, as well as EPUB3-style SSML-like attributes (http://idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml-ssml-attrib) can be recognized by synthesis processors.
(5) With regard to <math> elements, <annotation-xml encoding="application/ssml+xml"> can be recognized by synthesis processors.
(6) <input> types and speech recognition (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0020/api-draft.html), extending HTMLInputElement.
(7) Runtime dynamic grammars.
(8) SRGS/SISR object model.

The synthesis and recognition of speech containing mathematical and scientific formulas are interesting topics.  In the comments and suggestions above, the synthesis of mathematical and scientific formulas is broached and also interesting is how grammars can be described such that speech recognition transcripts can include XML, hypertext, or MathML mathematical and scientific notation.



Kind regards,

Adam Sobieski

> From: hwennborg@google.com<mailto:hwennborg@google.com>
> Date: Thu, 12 Apr 2012 10:30:03 +0100
> To: public-speech-api-contrib@w3.org<mailto:public-speech-api-contrib@w3.org>; public-webapps@w3.org<mailto:public-webapps@w3.org>; public-xg-htmlspeech@w3.org<mailto:public-xg-htmlspeech@w3.org>
> CC: satish@google.com<mailto:satish@google.com>; gshires@google.com<mailto:gshires@google.com>
> Subject: Speech API: first editor's draft posted
>
> In December, Google proposed [1] to public-webapps a Speech JavaScript
> API that subset supports the majority of the use-cases in the Speech
> Incubator Group's Final Report. This proposal provides a programmatic
> API that enables web-pages to synthesize speech output and to use
> speech recognition as an input for forms, continuous dictation and
> control.
>
> We have now posted in the Speech-API Community Group's repository, a
> slightly updated proposal [2], the differences include:
>
> - Document is now self-contained, rather than having multiple
> references to the XG Final Report.
> - Renamed SpeechReco interface to SpeechRecognition
> - Renamed interfaces and attributes beginning SpeechInput* to
> SpeechRecognition*
> - Moved EventTarget to constructor of SpeechRecognition
> - Clarified that grammars and lang are attributes of SpeechRecognition
> - Clarified that if index is greater than or equal to length, returns null
>
> We welcome discussion and feedback on this editor's draft. Please send
> your comments to the public-speech-api-contrib@w3.org<mailto:public-speech-api-contrib@w3.org> mailing list.
>
> Glen Shires
> Hans Wennborg
>
> [1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1696.html
> [2] http://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
>
Received on Tuesday, 17 April 2012 20:41:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 17 April 2012 20:41:53 GMT