- From: Andrew Hunt <andrew.hunt@speechworks.com>
- Date: Thu, 21 Feb 2002 22:29:26 -0500
- To: "Martin Duerst" <duerst@w3.org>, <www-voice@w3.org>
Martin, Thank you again for your detailed comments on the SRGS document. We've addressed all the issues you raised. All but a handful of changes have been applied to the document now available at: http://www.w3.org/Voice/Group/2002/grammar/grammar-spec-20020212-edit.html [W3C Member only] For non-W3C members, the Voice Browser is working towards an updated publication of the specification in the coming weeks. Would you kindly let us know if we have not satisfactorily addressed the issues or if you see further issues for us to consider. Regards, Andrew Hunt Lead Editor, SRGS > -----Original Message----- > From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On > Behalf Of Martin Duerst > Sent: Tuesday, October 09, 2001 4:39 AM > To: www-voice@w3.org > Subject: Personal comments on Speech Recognition Grammar Spec last call > > > Dear Voice Browser WG, > > Below please find my personal comments to the last call > of your Speech Recognition Grammar WD of August 20. > > Please note that I am not subscribed to www-voice@w3.org, > so please make sure you include my address in responses. > > > 1) 1.1: The term DTMF appears many times before it is introduced. > Please give its expansion/explanation the first time it appears. DTMF now a defined term (in new Section 1.5) with appropriate links on usage of the term. > 2) Conversion between formats: In case two formats are kept, > to have fully implemented conversion tools in both directions > implemented and tested should be a CR exit requirement. Yes, that is a requirement of the Implementation Report Plan as drafted. The XSLT in Appendix F demonstrates the conversion from an XML Form grammar to an ABNF Form grammar. The reverse transform has been performed in products and will be a CR exit criteria also. > 3) '1.4 Semantic Interpretation': The term 'semantics' is heavily > used (and misused) in various contexts. The use here bears > a serious risk of confusion with the Semantic Web. It is > difficult to understand what exactly this 'semantic interpretation' > is supposed to be, but at the moment, it looks mostly like > events fired upon detection of some input, with associated > scripts. In that case, it would be much better to use the > events/script terminology and maybe also the syntax. The group recognizes that there are many uses of the term semantics. Since "semantic interpretation" is the correct term of art in the speech research and commercial world we remain with the term. However, to clarify the domain-specific meaning Section 1.4 has been rewritten to expound upon the meaning of semantics in the context of speech recognition. > 4) 1.4 (and other places): "... the WG plans to require": > Requiring the use of another spec as of yet undefined or still > in the works by the current specification (which is in last > call) is really strange. Also, it does not seem adequate here; > the specifications are on purpose written independently and > so that other 'semantic interpretation' things can be used. > It is better to assure interoperability on a higher level, > e.g. by defining a profile or by just making all these > things W3C Recs. At the time of writing of the August Last call draft the group expectation was that related speech specs would be further advanced. It didn't turn out that way:( The spec now normatively references only W3C Recommendations and other stable standards. "plans to require" is gone from the spec. > 5) 2.1 Token: The section title seems inadequate. The section > should either speak only about single tokens, or the title > should be 'Token List' or something similar. Now "Tokens" -- not a world-class change but the title and section content are consistent. The section content is entirely rewritten to address concerns raised by the W3C I18N group and others. > 6) 2.1: Defining empty tokens or tokens containing only space > as illegal seems completely unnecessary and only complicates > the spec. These cases should be defined as equivalent with > special='NULL'. Allowing ( ) or () further complicates > the issue unnecessarily. The definition of empty tokens is gone. The definition of ( ) and () stands and was necessary to ensure that ABNF has an equivalent form to <item/> > 7) Please use XLink syntax (xlink:href) instead of the 'uri' > attribute. The voice WG faces a similar issue with XLink to XHTML. VoiceXML, in particular, has several elements with multiple URIs that cannot be handled without changing document structure -- e.g. creating sub-elements. With work ongoing considering a mechanism to permit URI identification through a Schema we deferred this changed. See Dave Raggett's analysis of Jan 31 2002 at: http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jan/0109.html > 8) 2.2.2: application/grammar+xml: The term 'grammar' seems much > too general here. Also, the spec says that this media type has > been requested, but I'm following the relevant IETF list and > do not remember such a request. Can you please provide a pointer > to the archive? We could not determine the status of a previous action item to request application/grammar+xml -- we presume the media type request was not submitted. We expect to shortly apply for the the following media types: application/sgrs+xml for the XML Form of SRGS application/sgrs for the ABNF Form of SRGS > 9) Many of the <pre> examples get cut when printed. This depends > very much on the device, but I would suggest an upper limit > of about 60 characters per line. All fixed. > 9) 2.2.4 Special rules: You should consider reserving some > rule names for future extension (e.g. all uppercase only names) We considered this and reached consensus that: 1. Reserving all uppercase names was too restrictive of other legitimate uses. 2. Reserving a select set of keywords that we anticipate might be standardized in the future but not defining them may counter-productive. As a result no changes were made to this aspect of the spec. > 10) 2.3: the term 'legal rule expansion' is used here but not yet > defined. It is a defined term with appropriate link when used. > 11) 2.3: The matching is described multiple times in various very > different places and words. E.g.: 'expansions /must/ be spoken...', > 'must be recognized in temporal sequence',... Clarified. > 12) 2.4: 'a set of alternatives must contain one or more alternatives': > Why not 'zero or more alternatives'? Zero would be equivalent to > special='VOID'. On the other hand, at the end of 2.4, the text > says that an empty <one-of> is allowed. We decided to stick with one or more alternatives and apply appropriate constraints in the Schema and DTD. This allows us to maintain the ABNF/XML semantic equivalence. > 13) 2.5, special case of <0> or <0-0>: Change the explanation to say > that this is the same as special='NULL'. Changed. > 14) There seems to be no need for both {!{ }!} and backslash escaping. > Please use only one mechanism, preferably backslash escaping. Backslash escaping was removed. Tags may be delimited by either {...} or {!{...}!} with no escaping within. Rationale: With the plans for a script-like language to be used within tags (ECMAScript or similar) and with the likelihood of common use of curly braces within the tags it was felt that backslash escaping would be more awkward and error-prone. > 15) 3., second paragraph: 'the rulename resolution specification' > Is this a separate spec, or part of this spec? Clarified. A rule reference is merely a URI and "rulename resolution" was a clumsy was of stating how to resolve the URI. > 16) 3.2 Scoping: The scoping rules are explained by reference to Java. > However, while there are very good reasons for data hiding and > therefore to have a default of 'private' in programming languages > such as Java, I can see absolutely no need for data hiding in > the case of speech grammar rules. There should be a better > explanation for why the default is 'private', or the default > should be changed to 'public' to make it easier to reuse rules. > Also, the rule that a private root can not be referenced by > name seems unmotivated. The working group feels that there are very solid reasons to have data hiding in grammars for much the same reasons as programming languages. For example, a typical date grammar may have dozens of rules to create a powerful but not over-general syntax for natural language dates. Of those rules only one or a handful would be useful to a developer of a parent grammar that needs to incorporate a date. By defining internal working rules as private and having the reference check constrained by the grammar processor there is significant reduction in the chances of misuse of a rule. This feature is important for the creation of reusable grammar libraries. > 17) The choice of 'ABNF' as a magic number seems to be much to > general. (see above for application/grammar+xml). Similar > considerations apply to the chosen public identifier (and the > namespace), as well as the use of the term 'XML Grammar' > in the place of 'XML Speach Recognition Grammar' We've changed the media type application to "application/srgs+xml" or "application/srgs" for ABNF. The RFC has not yet been submitted (should be very shortly). The group felt that "#ABNF" was sufficient as a magic number. Loose use of the term "XML Grammar" has been fixed. > 18) 4.1.4: I think it would be a good idea to change > <grammar root="rulename" ...> to <grammar root="#rulename" ...> The WG decided to stay with the definition of root as an IDREF rather than treating it as a URI. Since an IDREF must provide the name of an identifier within the same document the root attribute will be constrained to name a legal rule definition. This constraint will be useful for grammar authors as an early check. > 19) 4.1.5: Please use an URI as the identifier for 'semantic' tags. In progress. Considering defining the tag-format attribute as type NOTATION in the DTD/Schema. > 20) 2.2.2 and 4.2: It is a bad idea to have the media type specified > with the reference overwrite the media type determined from the > actual referenced resource. This issue has been a hot topic for each of the voice browser WG specs. The most widely held view is that a media type provided by a grammar author is more likely to be correct than one provided by a web server (for example). This is especially the case (1) because few web browsers and configured to serve the correct type for grammar documents, and (2) many content authors do not have control of the web server that serves their grammar content. In cases where the media type conflicts with the resource (not the http declared type) it is considered an error. > 21) RDF rather than the html-like ad-hoc <meta> should be used for > metadata. We have adopted the <metadata> element of SMIL 2.0 to encapsulate. We will keep <meta> in particular for overriding HTTP controls (expiry times, client-side HTTP controls etc) > Regards, Martin. > > #-#-# Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium > #-#-# mailto:duerst@w3.org http://www.w3.org/People/D%C3%BCrst Many thanks, Andrew Hunt Lead Editor, SRGS SpeechWorks International
Received on Thursday, 21 February 2002 22:31:12 UTC