API hooks for setting ASR properties


1.   This covers the API hooks for specifying grammars and also other recognition properties (both what these properties are, and how to specify them).

Relevant requirements and design decisions

Strong interest
Moderate interest Mild New
  1. The web application must be able to perform continuous recognition (I.e., dictation).
  2. The web application must be able to perform an open mic scenario (I.e., always listening for keywords).
  3. The web application must be able to get interim recognition results when it is performing continuous recognition.
Design Decisions

Settable Recognition Properties

General design question: Balance between parameters to recognize request and independent setting of properties on a recognition object?
I think there are too many useful properties for them all to be settable from the recognize request, although that is convenient.
Maybe we should pick a few properties that should be settable as parameters from the recognize request and let the more complex ones be set independently.
E.g. one grammar and the language could be set from the "recognize" request and the other ones have to be set, because they'll mostly be used by more advanced developers.
recognizer.recognize(grammar,language)

recognizer.recognize(grammar)
recognize.recognize(language)
recognizer.recognize()
but
recognizer.endpointdetection(true)
recognizer.getinterimresults(true)

from requirements and design decisions

grammar(URI or builtin name, weight) (FPR34, FPR45,FPR48,DD9, DD21,DD55, DD72), multiple grammars possible (DD55), or grammar() (DD11, FPR44)
disablegrammar(URI) (FPR45) disable a specific grammar only

maxresults (DD36) default is 1
language (FPR38, DD10)
recognitiontype (e.g. streaming, hotword) (NR1, NR2, DD33)
savewaveformURI (to save waveform) (FPR57)
inputwaveformURI (recognize from a saved waveform) (FPR57)
savewaveform (FPR57)

canrerecognize (DD76)
endpointdetection (DD28)
enablefinalizebeforeend (DD34)
receiveinterimresults (NR3), possibly with an optional  parameter to indicate the frequency of results requested, probably in msec
generally -- setparameter(parameter name, parameter value) (DD73)

Other

I don't think we have these in our requirements or design decisions but they are commonly used in speech API's, we should discuss adding these to design decisions.
confidencethreshold
speedvsaccuracy
profile, gender, age (for recognition tuned to a particular speaker or type of speaker)
sensitivity
completetimeout
incompletetimeout
maxspeechtimeout


Not addressed

Needs clarification, DD29

Interfaces

 

Speech Recognition Interface

 

Constants

 

Attributes

 

Methods:

Example:

      

 

WebIDL