RE: A few high level thoughts about the web api sections 3 and 4. from Robert Brown on 2011-09-14 (public-xg-htmlspeech@w3.org from September 2011)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Wed, 14 Sep 2011 01:01:26 +0000
To: Satish S <satish@google.com>, "olli@pettay.fi" <olli@pettay.fi>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD1B2565E4@TK5EX14MBXC112.redmond.corp.microsoft.>

I'm also struggling with sections 3 & 4 - SpeechService and SpeechServiceQuery.

Sorry for not chiming in earlier. While I can see the direction this is going, it just feels way too complicated to me. I think it will be a lot more work to iron out the details, but in the end won't make for an API that's easy to use.

Personally I'd prefer to take a simplified approach. Something like this...

Firstly, make it as easy as possible to use the built-in speech capabilities of a UA just by creating the SpeechInputRequest and SpeechOutputRequest objects, without any messing about with services and criteria and queries. Something like this:

    function simplestCase() {
        // just give me the default recognizer and synthesizer:
        simplestSR = new SpeechInputRequest();
        simplestTTS = new SpeechOutputRequest();
    }

Secondly, for cases where the UA has access to variety of different speech engines, rather than create a Query API and a Criteria API, just provide mandatory parameters and optional parameters as strings in the constructors for SpeechInputRequest and SpeechOutputRequest.

The constructor pattern would be something like this:

[Constructor(DOMString? mandatoryparams, optional DOMString? optionalparams)]

The usage would be something like this:

    function aLittleBitFussy() {
        // Give me a recognizer for Australian or British English,
        // with grammars for dictation and datetime.
        // It should preferably model a child's vocal tract, but doesn't need to.
        fussySR= new SpeechInputRequest("language=en-AU|en-GB;grammars=<builtin:dictation>,<builtin:datetime>",
                                        "age=child");

        // Give me a synthesizer. It must be Swedish.
        // If the voice named "Kiana" is installed, please use it.
        // Otherwise, I'd prefer a voice that at least sounds like a woman in her thirties, if you have one.
        fussyTTS = new SpeechOutputRequest("language=sv-SE",
                                           "name=Kiana;gender=female;age=30-40");
    }

Thirdly, only use a SpeechService object for actual services that aren't built-in to the UA. In this case we should model existing WebSockets and XHR patterns to initialize the service, and then use the service object as a parameter to the constructors for SpeechInputRequest and SpeechOutputRequest. And drop the Query object entirely.

Usage would be something like this:

    var ssvc;
    function initService() {
        //open a new service
        ssvc = new SpeechService("https://myspeechservice/?account=a84e-2198-4e60-00f3");
        ssvc.onopen = function () {
            //check that it has the characteristics we expected...

            //will it recognize en-AU or en-GB, and speak Swedish?
            if ((ssvc.getSupportedLanguages("recognition", "en-AU,en-GB") == '')
            || (ssvc.getSupportedLanguages("synthesis", "en-AU,en-GB") == '')
            //does it have the right grammars?
            || (ssvc.getSupportedGrammars("<builtin:dictation>,<builtin:us-cities>") == '')) {
                //no? okay, close it - we don't want it
               ssvc.close();
                ssvc.onclose = function () {
                    ssvc = null;
                }
                return;
            }
        }

        //get SR and TTS request objects using the service:
        serviceSR = new SpeechInputRequest(ssvc);
        serviceTTS = new SpeechOutputRequest(ssvc);
    }

Received on Wednesday, 14 September 2011 01:01:58 UTC