W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > September 2011

RE: A few high level thoughts about the web api sections 3 and 4.

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Thu, 15 Sep 2011 00:34:01 +0000
To: Satish S <satish@google.com>, Bjorn Bringert <bringert@google.com>
CC: "olli@pettay.fi" <olli@pettay.fi>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD1B257858@TK5EX14MBXC112.redmond.corp.microsoft.com>
>> how does the app find out whether the SpeechInput/OutputRequest is usable (speech service exists and supports the mandatory parameters)

Good question. The constructor could throw an exception if it can't fulfill the mandatory parameters?

        try {
            fussySR= new SpeechInputRequest(someparams);
        } catch (e) {
            alert("don't talk to me, I'm not listening because " + e.message);
        }

If we like this, perhaps we could also define some error codes. E.g.

        try {
            fussySR= new SpeechInputRequest(someparams);
        } catch (e) {
            switch (e.number) {
                case 100: // unsupported language code
                    alert("sorry, I don't speak your language");
                    break;
                case 200: // unsupported grammar
                    alert("sorry, I don't understand what this app wants me to listen for");
                    break;
                default:
                    alert("don't talk to me, I'm not listening because " + e.description);
            }
        }

>> Not sure about the use of structured strings for parameters
>> turn all the grammar/language strings into array parameters and array properties in true JS form.

Yeah, the strings are ugly.

I like the array idea.

We just need to find a way to express optional and mandatory params for different things without producing a constructor with a zillion parameters. i.e. not this:

[Constructor(DOMString[] mandatorylangs, DOMString[] optionallangs, DOMString[] mandatorygrammars, DOMString[] optionalgrammars, DOMString[] blahblah, DOMString[] optionalyadayada )]


From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish S
Sent: Wednesday, September 14, 2011 1:53 AM
To: Robert Brown
Cc: olli@pettay.fi; public-xg-htmlspeech@w3.org
Subject: Re: A few high level thoughts about the web api sections 3 and 4.

Thanks Robert, that does look simpler. One change I'd suggest is to turn all the grammar/language strings into array parameters and array properties in true JS form. So

function aLittleBitFussy() {
    // Give me a recognizer for Australian or British English,
    // with grammars for dictation and datetime.
    // It should preferably model a child's vocal tract, but doesn't need to.
    fussySR= new SpeechInputRequest(["en-AU", "en-GB"], ["<builtin:dictation>","<builtin:datetime>"], ["age=child"]);

    // Give me a synthesizer. It must be Swedish.
    // If the voice named "Kiana" is installed, please use it.
    // Otherwise, I'd prefer a voice that at least sounds like a woman in her thirties, if you have one.
    fussyTTS = new SpeechOutputRequest("sv-SE", false, 35, ["name=Kiana"]);
}

------

// will it recognize en-AU or en-GB, and speak Swedish?
if ((ssvc.recognitionLanguages.indexOf("en-AU") == -1 ||
    ssvc.recognitionLanguages.indexOf("en-GB") == -1 ||
    ssvc.synthesisLanguages.indexOf("sv-SE") == -1 ||
    // does it have the right grammars?
    ssvc.grammars.indexOf("<builtin:dictation>") == -1 ||
    ssvc.grammars.indexOf("<builtin:us-cities>") == -1) {
  //no ? okay, close it - we don't want it.
  ...
}


Cheers
Satish

On Wed, Sep 14, 2011 at 2:01 AM, Robert Brown <Robert.Brown@microsoft.com<mailto:Robert.Brown@microsoft.com>> wrote:
I'm also struggling with sections 3 & 4 - SpeechService and SpeechServiceQuery.

Sorry for not chiming in earlier. While I can see the direction this is going, it just feels way too complicated to me. I think it will be a lot more work to iron out the details, but in the end won't make for an API that's easy to use.

Personally I'd prefer to take a simplified approach. Something like this...

Firstly, make it as easy as possible to use the built-in speech capabilities of a UA just by creating the SpeechInputRequest and SpeechOutputRequest objects, without any messing about with services and criteria and queries. Something like this:

    function simplestCase() {
        // just give me the default recognizer and synthesizer:
        simplestSR = new SpeechInputRequest();
        simplestTTS = new SpeechOutputRequest();
    }

Secondly, for cases where the UA has access to variety of different speech engines, rather than create a Query API and a Criteria API, just provide mandatory parameters and optional parameters as strings in the constructors for SpeechInputRequest and SpeechOutputRequest.

The constructor pattern would be something like this:

[Constructor(DOMString? mandatoryparams, optional DOMString? optionalparams)]

The usage would be something like this:

    function aLittleBitFussy() {
        // Give me a recognizer for Australian or British English,
        // with grammars for dictation and datetime.
        // It should preferably model a child's vocal tract, but doesn't need to.
        fussySR= new SpeechInputRequest("language=en-AU|en-GB;grammars=<builtin:dictation>,<builtin:datetime>",
                                        "age=child");

        // Give me a synthesizer. It must be Swedish.
        // If the voice named "Kiana" is installed, please use it.
        // Otherwise, I'd prefer a voice that at least sounds like a woman in her thirties, if you have one.
        fussyTTS = new SpeechOutputRequest("language=sv-SE",
                                           "name=Kiana;gender=female;age=30-40");
    }

Thirdly, only use a SpeechService object for actual services that aren't built-in to the UA. In this case we should model existing WebSockets and XHR patterns to initialize the service, and then use the service object as a parameter to the constructors for SpeechInputRequest and SpeechOutputRequest. And drop the Query object entirely.

Usage would be something like this:

    var ssvc;
    function initService() {
        //open a new service
        ssvc = new SpeechService("https://myspeechservice/?account=a84e-2198-4e60-00f3");
        ssvc.onopen = function () {
            //check that it has the characteristics we expected...

            //will it recognize en-AU or en-GB, and speak Swedish?
            if ((ssvc.getSupportedLanguages("recognition", "en-AU,en-GB") == '')
            || (ssvc.getSupportedLanguages("synthesis", "en-AU,en-GB") == '')
            //does it have the right grammars?
            || (ssvc.getSupportedGrammars("<builtin:dictation>,<builtin:us-cities>") == '')) {
                //no? okay, close it - we don't want it
               ssvc.close();
                ssvc.onclose = function () {
                    ssvc = null;
                }
                return;
            }
        }

        //get SR and TTS request objects using the service:
        serviceSR = new SpeechInputRequest(ssvc);
        serviceTTS = new SpeechOutputRequest(ssvc);
    }
Received on Thursday, 15 September 2011 00:34:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 15 September 2011 00:34:34 GMT