W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > September 2011

Re: A few high level thoughts about the web api sections 3 and 4.

From: Bjorn Bringert <bringert@google.com>
Date: Thu, 15 Sep 2011 17:06:11 +0100
Message-ID: <CAJtyJaXbzJODWJAQVTYOObNXc7fuq=Tq==VEhSdW+mjWwWPAYw@mail.gmail.com>
To: Robert Brown <Robert.Brown@microsoft.com>
Cc: Satish S <satish@google.com>, "olli@pettay.fi" <olli@pettay.fi>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
I think it has to be some async mechanism, since the init might
require a network request or some other expensive operation.

On Thu, Sep 15, 2011 at 1:34 AM, Robert Brown
<Robert.Brown@microsoft.com> wrote:
>>> how does the app find out whether the SpeechInput/OutputRequest is usable
>>> (speech service exists and supports the mandatory parameters)
>
>
>
> Good question. The constructor could throw an exception if it can’t fulfill
> the mandatory parameters?
>
>
>
>         try {
>
>             fussySR= new SpeechInputRequest(someparams);
>
>         } catch (e) {
>
>             alert("don't talk to me, I'm not listening because " +
> e.message);
>
>         }
>
>
>
> If we like this, perhaps we could also define some error codes. E.g.
>
>
>
>         try {
>
>             fussySR= new SpeechInputRequest(someparams);
>
>         } catch (e) {
>
>             switch (e.number) {
>
>                 case 100: // unsupported language code
>
>                     alert("sorry, I don't speak your language");
>
>                     break;
>
>                 case 200: // unsupported grammar
>
>                     alert("sorry, I don't understand what this app wants me
> to listen for");
>
>                     break;
>
>                 default:
>
>                     alert("don't talk to me, I'm not listening because " +
> e.description);
>
>             }
>
>         }
>
>
>
>>> Not sure about the use of structured strings for parameters
>
>>> turn all the grammar/language strings into array parameters and array
>>> properties in true JS form.
>
>
>
> Yeah, the strings are ugly.
>
>
>
> I like the array idea.
>
>
>
> We just need to find a way to express optional and mandatory params for
> different things without producing a constructor with a zillion parameters.
> i.e. not this:
>
>
>
> [Constructor(DOMString[] mandatorylangs, DOMString[] optionallangs,
> DOMString[] mandatorygrammars, DOMString[] optionalgrammars, DOMString[]
> blahblah, DOMString[] optionalyadayada )]
>
>
>
>
>
> From: public-xg-htmlspeech-request@w3.org
> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish S
> Sent: Wednesday, September 14, 2011 1:53 AM
> To: Robert Brown
> Cc: olli@pettay.fi; public-xg-htmlspeech@w3.org
> Subject: Re: A few high level thoughts about the web api sections 3 and 4.
>
>
>
> Thanks Robert, that does look simpler. One change I'd suggest is to turn all
> the grammar/language strings into array parameters and array properties in
> true JS form. So
>
>
>
> function aLittleBitFussy() {
>
>     // Give me a recognizer for Australian or British English,
>
>     // with grammars for dictation and datetime.
>
>     // It should preferably model a child's vocal tract, but doesn't need
> to.
>
>     fussySR= new SpeechInputRequest(["en-AU", "en-GB"],
> ["<builtin:dictation>","<builtin:datetime>"], ["age=child"]);
>
>
>
>     // Give me a synthesizer. It must be Swedish.
>
>     // If the voice named "Kiana" is installed, please use it.
>
>     // Otherwise, I'd prefer a voice that at least sounds like a woman in
> her thirties, if you have one.
>
>     fussyTTS = new SpeechOutputRequest("sv-SE", false, 35, ["name=Kiana"]);
>
> }
>
>
>
> ------
>
>
>
> // will it recognize en-AU or en-GB, and speak Swedish?
>
> if ((ssvc.recognitionLanguages.indexOf("en-AU") == -1 ||
>
>     ssvc.recognitionLanguages.indexOf("en-GB") == -1 ||
>
>     ssvc.synthesisLanguages.indexOf("sv-SE") == -1 ||
>
>     // does it have the right grammars?
>
>     ssvc.grammars.indexOf("<builtin:dictation>") == -1 ||
>
>     ssvc.grammars.indexOf("<builtin:us-cities>") == -1) {
>
>   //no ? okay, close it - we don't want it.
>
>   ...
>
> }
>
>
>
>
>
> Cheers
> Satish
>
> On Wed, Sep 14, 2011 at 2:01 AM, Robert Brown <Robert.Brown@microsoft.com>
> wrote:
>
> I’m also struggling with sections 3 & 4 – SpeechService and
> SpeechServiceQuery.
>
>
>
> Sorry for not chiming in earlier. While I can see the direction this is
> going, it just feels way too complicated to me. I think it will be a lot
> more work to iron out the details, but in the end won’t make for an API
> that’s easy to use.
>
>
>
> Personally I’d prefer to take a simplified approach. Something like this…
>
>
>
> Firstly, make it as easy as possible to use the built-in speech capabilities
> of a UA just by creating the SpeechInputRequest and SpeechOutputRequest
> objects, without any messing about with services and criteria and queries.
> Something like this:
>
>
>
>     function simplestCase() {
>
>         // just give me the default recognizer and synthesizer:
>
>         simplestSR = new SpeechInputRequest();
>
>         simplestTTS = new SpeechOutputRequest();
>
>     }
>
>
>
> Secondly, for cases where the UA has access to variety of different speech
> engines, rather than create a Query API and a Criteria API, just provide
> mandatory parameters and optional parameters as strings in the constructors
> for SpeechInputRequest and SpeechOutputRequest.
>
>
>
> The constructor pattern would be something like this:
>
>
>
> [Constructor(DOMString? mandatoryparams, optional DOMString?
> optionalparams)]
>
>
>
> The usage would be something like this:
>
>
>
>     function aLittleBitFussy() {
>
>         // Give me a recognizer for Australian or British English,
>
>         // with grammars for dictation and datetime.
>
>         // It should preferably model a child's vocal tract, but doesn't
> need to.
>
>         fussySR= new
> SpeechInputRequest("language=en-AU|en-GB;grammars=<builtin:dictation>,<builtin:datetime>",
>
>                                         "age=child");
>
>
>
>         // Give me a synthesizer. It must be Swedish.
>
>         // If the voice named "Kiana" is installed, please use it.
>
>         // Otherwise, I'd prefer a voice that at least sounds like a woman
> in her thirties, if you have one.
>
>         fussyTTS = new SpeechOutputRequest("language=sv-SE",
>
>
> "name=Kiana;gender=female;age=30-40");
>
>     }
>
>
>
> Thirdly, only use a SpeechService object for actual services that aren’t
> built-in to the UA. In this case we should model existing WebSockets and XHR
> patterns to initialize the service, and then use the service object as a
> parameter to the constructors for SpeechInputRequest and
> SpeechOutputRequest. And drop the Query object entirely.
>
>
>
> Usage would be something like this:
>
>
>
>     var ssvc;
>
>     function initService() {
>
>         //open a new service
>
>         ssvc = new
> SpeechService("https://myspeechservice/?account=a84e-2198-4e60-00f3");
>
>         ssvc.onopen = function () {
>
>             //check that it has the characteristics we expected...
>
>
>
>             //will it recognize en-AU or en-GB, and speak Swedish?
>
>             if ((ssvc.getSupportedLanguages("recognition", "en-AU,en-GB") ==
> '')
>
>             || (ssvc.getSupportedLanguages("synthesis", "en-AU,en-GB") ==
> '')
>
>             //does it have the right grammars?
>
>             ||
> (ssvc.getSupportedGrammars("<builtin:dictation>,<builtin:us-cities>") ==
> '')) {
>
>                 //no? okay, close it - we don't want it
>
>                ssvc.close();
>
>                 ssvc.onclose = function () {
>
>                     ssvc = null;
>
>                 }
>
>                 return;
>
>             }
>
>         }
>
>
>
>         //get SR and TTS request objects using the service:
>
>         serviceSR = new SpeechInputRequest(ssvc);
>
>         serviceTTS = new SpeechOutputRequest(ssvc);
>
>     }
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Thursday, 15 September 2011 16:06:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 15 September 2011 16:06:37 GMT