RE: Next HTML web api document from Robert Brown on 2011-11-01 (public-xg-htmlspeech@w3.org from November 2011)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Tue, 1 Nov 2011 04:48:30 +0000
To: Michael Bodell <mbodell@microsoft.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>, "johnston@research.att.com" <johnston@research.att.com>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD1B2E9121@TK5EX14MBXC116.redmond.corp.microsoft.>

I've started trying to write some samples. Here are my thoughts so far:

1.
To get the reco result I think i have to write "e.result.item(0).interpretation". This is a lot of dots and an index just to get the top result.

I'd much rather just write "e.interpretation". We should consider just putting the utterance & interpretation of the top result right there on the SpeechInputResult interface, since in most cases people won't want to dig into the array of alternatives.

2.
"utterance" has a couple of different meanings in the doc. It's alternatively the recording of what the person said, or the transcript returned by the recognizer.

We should consider calling the transcript either "transcript" or just "text".

3.
The "modal" attribute on SpeechGrammar is unnecessarily restrictive.

There are cases where I'll want to have multiple grammars active, but not all, and not just one.

Developers would be better off with a boolean enabled attribute on each grammar.

4.
The interpretation attribute is completely opaque. That may be necessary given that SISR can pretty much return anything. But it'll need some examples to show how to use it.

5.
The array of SpeechGrammar objects is too cumbersome.

In most cases I'd like to write something simple like:

mySR.speechGrammars.push("USstates.grxml","majorUScities.grxml","majorInternationalCities.grxml");

But I can't.  I have new-up a separate object for each one then add it to the array, even when I don't care about the other attributes.

Better to just make it an array of URI strings, and add functions for the edge cases. e.g.
void enableGrammar(in DOMString uri, in boolean enabled);
void setWeight(in DOMString grammarUri, in float weight);

And yeah, I remember arguing the opposite on the phone call. But that's before I tried writing sample code.

6.
The names are a bit long.

e.g.  "new SpeechInputRequest()" vs "new SpeechIn()" .
e.g.  "mySR.speechGrammars.push("foo")" vs  "mySR.grammars.push("foo")" .
e.g. "resultEMMAXML" vs "EMMAXML" or just "EMMA" (call the other one "EMMAText" )
e.g. "inputWaveformURI" vs "inputURI"
etc.





________________________________
From: Michael Bodell [mbodell@microsoft.com]
Sent: Friday, October 28, 2011 10:31 PM
To: public-xg-htmlspeech@w3.org; johnston@research.att.com
Subject: Next HTML web api document


Only changes other than date and last document link are more examples.



This time there are two more examples which show two ways to use the JS API.  Neither takes advantage of an earlier open (like onload="sir.open()" to get permisions and early grammar usage), but both should work and show some of the JS api work that should help others with their examples for before the F2F.

Received on Tuesday, 1 November 2011 04:49:09 UTC