Microsoft SRGS Implementation Report

Microsoft has implemented the XML Form of the Speech Recognition Grammar
Specification (SRGS) in a developmental version of the Microsoft Speech
API (SAPI). The results are attached. 

SAPI is middleware which allows developers to use speech recognition
and/or speech synthesis in their applications. To conduct the tests
specified in the implementation report, a text-input interface was used
with SAPI's core grammar processor. Given SAPI's ability to use the core
grammar processor in speech, DTMF and text environments, we believe this
successfully demonstrates coverage of SRGS in the following agents
required by the Implementation Report Plan : 

  - XML Grammar Processor for ASR 
  - XML Grammar Processor for DTMF 
  - XML text parser 

Previous versions of SAPI may be found at
http://www.microsoft.com/speech/. 

As a founding member of the Speech Application Language Tags (SALT)
Forum (http://www.saltforum.org) and an active member of the W3C Voice
Browser and Multimodal Interaction Working Groups, Microsoft believes
speech standards will play a key role in the growing market for speech
applications. We consider SRGS a thorough, well-designed specification
which, by providing a common syntax for speech recognition grammars,
should help promote interoperability and portability in speech
application development. Microsoft intends to support SRGS in its suite
of forthcoming SALT products: 

    .Net Speech SDK, a set of speech application development tools and
speech controls integrated with Visual Studio(r) .NET that will make it
faster and easier for web developers to incorporate speech into web
applications (a Beta version of the SDK is available at
http://www.microsoft.com/speech/getsdk); 
   - .Net Speech platform, an integrated multimodal and telephony
platform for multiple clients such as PCs, telephones, wireless personal
digital assistants (PDAs), and Tablet PCs. 

As required by the Implementation Report plan, some technical details of
the SAPI implementation are as follows: 

1. Relationship of SAPI output to LPS 
SAPI's XML representation of recognition output was mapped explicitly to
the LPS defined in Appendix H by a recursive traversal of the parse
tree. In some tests a complete mapping into LPS was not always possible,
for example, the content of the tag element and the exact path of
external rules are not copied directly. However, since these are minor
aspects of only the surface form of LPS (itself an informative part of
the specification) and they in no way affect the behaviour of the
grammar processor as defined in the specification, we do not consider
these an unsuccessful implementation of the tests. 

2. Weights and probabilities
As noted in the Implementation Report Plan, pass/fail testing of weights
on alternatives and probabilities on repeats is not possible. We believe
these features are implementable and useful. We have implemented support
for both weights and repeat probabilities into our ASR grammar
processor. We believe that, when properly estimated, weights and repeat
probabilities have a positive effect in maximizing recognizer
performance. 

3. Language support
Speech recognition engines were not available for certain languages
required by the tests. For those tests where a single language is used
in the grammar and a recognition engine was not available for that
language, SAPI's text input mechanism was used with the grammar
processor to compile the grammar, parse the input and produce successful
output, and we have considered this a successful implementation. 

4. Test set version
The SAPI implementation was run on the error-corrected set of tests
circulated within the Voice Browser Working Group on 23 July (see
http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jul/0043.html
(members only) 

5. Unimplemented features 
As noted in the results of the tests, the following features of the
specification have not been implemented. 
   - multiple languages within the same grammar
      This appears to be a useful feature for certain deployment
scenarios. 
   - base URI specification for rule reference (xml:base and meta base) 
      This appears to be a generally useful feature. 
   - lexicon content 
      Although we have implemented the syntax of <lexicon> (and thereby
the correct test set behaviour), we have not implemented the semantics
of lexicon look-up. The ability to specify pronunciations is clearly a
very useful feature. 
   - repetition of tag elements equivalent to a single tag element 
      This is a peripheral feature, but we would like to seek guidance
on its utility for the grammar developer. 


Stephen Potter
.Net Speech Technologies
Microsoft Corporation

Received on Friday, 30 August 2002 19:58:38 UTC