API hooks for setting ASR properties

1. This covers the API hooks for specifying grammars and also other recognition properties (both what these properties are, and how to specify them).

Relevant requirements and design decisions

Strong interest

FPR38. Web application must be able to specify language of recognition.
FPR45. Applications should be able to specify the grammars (or lack thereof) separately for each recognition.

FPR34. Web application must be able to specify domain specific custom grammars.

FPR48. Web application author must be able to specify a domain specific statistical language model.

Moderate interest

FPR44. Recognition without specifying a grammar should be possible.

Mild

FPR57. Web applications must be able to request recognition based on previously sent audio

New

Design Decisions

DD9. It must be possible to reference ASR grammars by URI.
DD10. It must be possible to select the ASR language using language tags.
DD11. It must be possible to leave the ASR grammar unspecified. Behavior in this case is not yet defined.
DD21. A standard set of common-task grammars must be supported. The details of what those are is TBD.
DD28. A low-latency endpoint detector must be available. It should be possible for a web app to enable and disable it, although the default setting (enabled/disabled) is TBD. The detector detects both start of speech and end of speech and fires an event in each case.
DD29. The API will provide control over which portions of the captured audio are sent to the recognizer.
DD33. Support for streaming audio is required -- in particular, that ASR may begin processing before the user has finished speaking.
DD34. It must be possible for the recognizer to return a final result before the user is done speaking.
DD36. Maxresults should be an ASR parameter representing the maximum number of results to return.
DD55. The API will support multiple simultaneous grammars, any combination of allowed grammar formats. It will also support a weight on each grammar.
DD72. In Javascript, speech reco requests should have an attribute for a sequence of grammars, each of which can have properties, including weight (and possibly language, but that is TBD).
DD 73. In Javascript will be able to set parameters as dot properties and also via a getParameters method. Browser should also allow service-specific parameters to be set this way.in
DD76. It must be possible to do one or more re-recognitions with any request that you have indicated before first use that it can be re-recognized later. This will be indicated in the API by setting a parameter to indicate re-recognition. Any parameter can be changed, including the speech service.

Settable Recognition Properties

General design question: Balance between parameters to recognize request and independent setting of properties on a recognition object?
I think there are too many useful properties for them all to be settable from the recognize request, although that is convenient.
Maybe we should pick a few properties that should be settable as parameters from the recognize request and let the more complex ones be set independently.
E.g. one grammar and the language could be set from the "recognize" request and the other ones have to be set, because they'll mostly be used by more advanced developers.
recognizer.recognize(grammar,language)

recognizer.recognize(grammar)
recognize.recognize(language)
recognizer.recognize()
but
recognizer.endpointdetection(true)
recognizer.getinterimresults(true)

from requirements and design decisions

grammar(URI or builtin name, weight) (FPR34, FPR45,FPR48,DD9, DD21,DD55, DD72), multiple grammars possible (DD55), or grammar() (DD11, FPR44)
disablegrammar(URI) (FPR45) disable a specific grammar only

maxresults (DD36) default is 1
language (FPR38, DD10)
recognitiontype (e.g. streaming, hotword) (NR1, NR2, DD33)
savewaveformURI (to save waveform) (FPR57)
inputwaveformURI (recognize from a saved waveform) (FPR57)
savewaveform (FPR57)

canrerecognize (DD76)
endpointdetection (DD28)
enablefinalizebeforeend (DD34)
receiveinterimresults (NR3), possibly with an optional parameter to indicate the frequency of results requested, probably in msec
generally -- setparameter(parameter name, parameter value) (DD73)

Other

I don't think we have these in our requirements or design decisions but they are commonly used in speech API's, we should discuss adding these to design decisions.
confidencethreshold
speedvsaccuracy
profile, gender, age (for recognition tuned to a particular speaker or type of speaker)
sensitivity
completetimeout
incompletetimeout
maxspeechtimeout

Not addressed

Needs clarification, DD29

Interfaces

Speech Recognition Interface

Constants

Attributes

Methods:

Example:

WebIDL