- From: Stephen Potter <spotter@microsoft.com>
- Date: Fri, 13 Dec 2002 22:30:57 -0000
- To: <www-voice@w3.org>
- Message-ID: <9584A4A864BD8548932F2F88EB30D1C60B04E9C3@tvp-msg-01.europe.corp.microsoft.com>
Microsoft SRGS Implementation Report ============================= (This report corresponds to the updated SRGS test set found at http://www.w3.org/2002/06/srgs-irp/ and supersedes Microsoft's previous SRGS Implementation Report.) Microsoft has implemented the XML Form of the Speech Recognition Grammar Specification (SRGS) in a developmental version of the Microsoft Speech API (SAPI). SAPI allows developers to use speech recognition and/or speech synthesis in their applications. To conduct the tests specified in the implementation report, a text-input interface was used with SAPI's core grammar processor. Given SAPI's ability to use the core grammar processor in speech, DTMF and text environments, we believe this successfully demonstrates coverage of SRGS in the following agents required by the Implementation Report Plan : - XML Grammar Processor for ASR - XML Grammar Processor for DTMF - XML text parser Previous versions of SAPI may be found at http://www.microsoft.com/speech/. As a founding member of the Speech Application Language Tags (SALT) Forum (http://www.saltforum.org) and an active member of the W3C Voice Browser and Multimodal Interaction Working Groups, Microsoft believes speech standards will play a key role in the growing market for speech applications. We consider SRGS a thorough, well-designed specification which, by providing a common syntax for speech recognition grammars, should help promote interoperability and portability in speech application development. Microsoft intends to support SRGS in its suite of forthcoming SALT products: - .Net Speech SDK, a set of speech application development tools and speech controls integrated with Visual Studio.NET that enables rapid incorporation of speech into web applications (a Beta 2 version of the SDK is available at http://www.microsoft.com/speech/); - .Net Speech platform, an integrated multimodal and telephony platform for multiple clients such as PCs, telephones, wireless personal digital assistants (PDAs), and Tablet PCs. As required by the Implementation Report plan, some technical details of the SAPI implementation are as follows: 1. Relationship of SAPI output to LPS SAPI's XML representation of recognition output was mapped explicitly to the LPS defined in Appendix H by a recursive traversal of the parse tree. In some tests a complete mapping into LPS was not always possible, for example, the content of the tag element and the exact path of external rules are not copied directly. However, since these are minor aspects of only the surface form of LPS (itself an informative part of the specification) and they in no way affect the behaviour of the grammar processor as defined in the specification, we do not consider these an unsuccessful implementation of the tests. 2. Weights and probabilities As noted in the Implementation Report Plan, pass/fail testing of weights on alternatives and probabilities on repeats is not possible. We believe these features are implementable and useful. We have implemented support for both weights and repeat probabilities into our ASR grammar processor. We believe that, when properly estimated, weights and repeat probabilities have a positive effect in maximizing recognizer performance. 3. Language support Speech recognition engines were not available for certain languages required by the tests. For those tests where a single language is used in the grammar and a recognition engine was not available for that language, SAPI's text input mechanism was used with the grammar processor to compile the grammar, parse the input and produce successful output, and we have considered this a successful implementation. 4. Test set version The SAPI implementation was run on the set of tests posted at: http://www.w3.org/2002/06/srgs-irp/ on 24 October 2002 (see http://lists.w3.org/Archives/Public/www-voice/2002OctDec/0039.html for details). 5. Unimplemented features As noted in the results of the tests, the following features of the specification have not been implemented: - multiple languages within the same grammar This appears to be a useful feature for certain deployment scenarios. - lexicon content Although we have implemented the syntax of <lexicon> (and thereby the correct test set behaviour), we have not implemented the semantics of lexicon look-up. The ability to specify pronunciations is clearly a very useful feature. - repetition of tag elements equivalent to a single tag element (tag-repetition.grxml.) In our belief, where tag elements are repeated without input tokens, this is a developer error, and SRGS processors should not be required to resolve the repetition to a single tag. Stephen Potter .Net Speech Technologies Microsoft Corporation
Attachments
- text/xml attachment: results.xml
Received on Friday, 13 December 2002 17:31:31 UTC