API hooks for setting ASR properties
1.
This covers the API hooks for specifying grammars
and also other recognition properties (both
what these properties are, and how to specify them).
Relevant requirements and design decisions
Strong interest
Moderate interest
Mild
New
- The
web application must be able to perform continuous recognition (I.e.,
dictation).
- The
web application must be able to perform an open mic scenario (I.e.,
always listening for keywords).
- The
web application must be able to get interim recognition results when it
is performing continuous recognition.
Design Decisions
- DD9. It must be possible to reference ASR grammars by URI.
- DD10. It must be possible to select the ASR language using
language tags.
- DD11. It must be possible to leave the ASR grammar unspecified.
Behavior in this case is not yet defined.
- DD21. A standard set of common-task grammars must be supported.
The details of what those are is TBD.
- DD28. A low-latency endpoint detector must be available. It
should be possible for a web app to enable and disable it, although the
default setting (enabled/disabled) is TBD. The detector detects both
start of speech and end of speech and fires an event in each case.
- DD29. The API will provide control over which portions of the
captured audio are sent to the recognizer.
- DD33. Support for streaming audio is required -- in particular,
that ASR may begin processing before the user has finished speaking.
- DD34. It must be possible for the recognizer to return a final
result before the user is done speaking.
- DD36. Maxresults should be an ASR parameter representing the
maximum number of results to return.
- DD55. The API will support multiple simultaneous grammars, any
combination of allowed grammar formats. It will also support a weight
on each grammar.
- DD72. In Javascript, speech reco requests should have an
attribute for a sequence of grammars, each of which can have
properties, including weight (and possibly language, but that is TBD).
- DD 73. In Javascript will be able to set parameters as dot
properties and also via a getParameters method. Browser should also
allow service-specific parameters to be set this way.in
- DD76. It must be possible to do one or more re-recognitions with
any request that you have indicated before first use that it can be
re-recognized later. This will be indicated in the API by setting a
parameter to indicate re-recognition. Any parameter can be changed,
including the speech service.
Settable Recognition Properties
General design question: Balance between
parameters to recognize request and independent setting of properties
on a recognition object?
I think there are too many useful properties for them all to be
settable from the recognize request, although that is convenient.
Maybe we should pick a few properties that should be settable as
parameters from the recognize request and let the more complex ones be
set independently.
E.g. one grammar and the language could be set from the "recognize"
request and the other ones have to be set, because they'll mostly be
used by more advanced developers.
recognizer.recognize(grammar,language)
recognizer.recognize(grammar)
recognize.recognize(language)
recognizer.recognize()
but
recognizer.endpointdetection(true)
recognizer.getinterimresults(true)
from requirements and design decisions
grammar(URI or builtin name, weight) (FPR34,
FPR45,FPR48,DD9, DD21,DD55, DD72), multiple grammars possible (DD55), or grammar() (DD11, FPR44)
disablegrammar(URI) (FPR45) disable a specific grammar only
maxresults (DD36) default is 1
language (FPR38, DD10)
recognitiontype (e.g. streaming, hotword) (NR1, NR2, DD33)
savewaveformURI (to save waveform) (FPR57)
inputwaveformURI (recognize from a saved waveform) (FPR57)
savewaveform (FPR57)
canrerecognize (DD76)
endpointdetection (DD28)
enablefinalizebeforeend (DD34)
receiveinterimresults (NR3), possibly with an optional parameter
to indicate the frequency of results requested, probably in msec
generally -- setparameter(parameter name, parameter value) (DD73)
Other
I don't think we have these in our requirements
or design decisions but they are commonly used in speech API's, we
should discuss adding these to design decisions.
confidencethreshold
speedvsaccuracy
profile, gender, age (for
recognition tuned to a particular speaker or type of speaker)
sensitivity
completetimeout
incompletetimeout
maxspeechtimeout
Not addressed
Needs clarification, DD29
Speech Recognition Interface
Constants
Attributes
Methods:
Example: