RE: Personal comments on Speech Recognition Grammar Spec last call

Martin,

Thank you again for your detailed comments on the SRGS document.
We've addressed all the issues you raised.  All but a handful of
changes have been applied to the document now available at:

  http://www.w3.org/Voice/Group/2002/grammar/grammar-spec-20020212-edit.html
  [W3C Member only]

For non-W3C members, the Voice Browser is working towards an updated
publication of the specification in the coming weeks.

Would you kindly let us know if we have not satisfactorily addressed
the issues or if you see further issues for us to consider.

Regards,
  Andrew Hunt
  Lead Editor, SRGS

> -----Original Message-----
> From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On
> Behalf Of Martin Duerst
> Sent: Tuesday, October 09, 2001 4:39 AM
> To: www-voice@w3.org
> Subject: Personal comments on Speech Recognition Grammar Spec last call
> 
> 
> Dear Voice Browser WG,
> 
> Below please find my personal comments to the last call
> of your Speech Recognition Grammar WD of August 20.
> 
> Please note that I am not subscribed to www-voice@w3.org,
> so please make sure you include my address in responses.
> 
> 
> 1) 1.1: The term DTMF appears many times before it is introduced.
>     Please give its expansion/explanation the first time it appears.

DTMF now a defined term (in new Section 1.5) with appropriate links
on usage of the term.

> 2) Conversion between formats: In case two formats are kept,
>     to have fully implemented conversion tools in both directions
>     implemented and tested should be a CR exit requirement.

Yes, that is a requirement of the Implementation Report Plan as
drafted.  The XSLT in Appendix F demonstrates the conversion from
an XML Form grammar to an ABNF Form grammar.  The reverse transform
has been performed in products and will be a CR exit criteria also.

> 3) '1.4 Semantic Interpretation': The term 'semantics' is heavily
>     used (and misused) in various contexts. The use here bears
>     a serious risk of confusion with the Semantic Web. It is
>     difficult to understand what exactly this 'semantic interpretation'
>     is supposed to be, but at the moment, it looks mostly like
>     events fired upon detection of some input, with associated
>     scripts. In that case, it would be much better to use the
>     events/script terminology and maybe also the syntax.

The group recognizes that there are many uses of the term semantics.  
Since "semantic interpretation" is the correct term of art in the speech
research and commercial world we remain with the term.  However,
to clarify the domain-specific meaning Section 1.4 has been rewritten 
to expound upon the meaning of semantics in the context of speech 
recognition.

> 4) 1.4 (and other places): "... the WG plans to require":
>     Requiring the use of another spec as of yet undefined or still
>     in the works by the current specification (which is in last
>     call) is really strange. Also, it does not seem adequate here;
>     the specifications are on purpose written independently and
>     so that other 'semantic interpretation' things can be used.
>     It is better to assure interoperability on a higher level,
>     e.g. by defining a profile or by just making all these
>     things W3C Recs.

At the time of writing of the August Last call draft the group 
expectation was that related speech specs would be further advanced.
It didn't turn out that way:(

The spec now normatively references only W3C Recommendations and
other stable standards.  "plans to require" is gone from the spec.

> 5) 2.1 Token: The section title seems inadequate. The section
>     should either speak only about single tokens, or the title
>     should be 'Token List' or something similar.

Now "Tokens" -- not a world-class change but the title and section 
content are consistent.  The section content is entirely rewritten
to address concerns raised by the W3C I18N group and others.

> 6) 2.1: Defining empty tokens or tokens containing only space
>     as illegal seems completely unnecessary and only complicates
>     the spec. These cases should be defined as equivalent with
>     special='NULL'. Allowing ( ) or () further complicates
>     the issue unnecessarily.

The definition of empty tokens is gone.

The definition of ( ) and () stands and was necessary to ensure
that ABNF has an equivalent form to <item/>

> 7) Please use XLink syntax (xlink:href) instead of the 'uri'
>     attribute.

The voice WG faces a similar issue with XLink to XHTML.  VoiceXML, 
in particular, has several elements with multiple URIs that cannot
be handled without changing document structure -- e.g. creating
sub-elements.

With work ongoing considering a mechanism to permit URI identification
through a Schema we deferred this changed.

See Dave Raggett's analysis of Jan 31 2002 at:

http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jan/0109.html

> 8) 2.2.2: application/grammar+xml: The term 'grammar' seems much
>     too general here. Also, the spec says that this media type has
>     been requested, but I'm following the relevant IETF list and
>     do not remember such a request. Can you please provide a pointer
>     to the archive?

We could not determine the status of a previous action item to
request application/grammar+xml -- we presume the media type request
was not submitted.  We expect to shortly apply for the the following
media types:

  application/sgrs+xml for the XML Form of SRGS
  application/sgrs     for the ABNF Form of SRGS

> 9) Many of the <pre> examples get cut when printed. This depends
>     very much on the device, but I would suggest an upper limit
>     of about 60 characters per line.

All fixed.

> 9) 2.2.4 Special rules: You should consider reserving some
>     rule names for future extension (e.g. all uppercase only names)

We considered this and reached consensus that:

1. Reserving all uppercase names was too restrictive of other
   legitimate uses.
2. Reserving a select set of keywords that we anticipate might
   be standardized in the future but not defining them may 
   counter-productive.

As a result no changes were made to this aspect of the spec.

> 10) 2.3: the term 'legal rule expansion' is used here but not yet
>      defined.

It is a defined term with appropriate link when used.

> 11) 2.3: The matching is described multiple times in various very
>      different places and words. E.g.: 'expansions /must/ be spoken...',
>      'must be recognized in temporal sequence',...

Clarified.

> 12) 2.4: 'a set of alternatives must contain one or more alternatives':
>      Why not 'zero or more alternatives'? Zero would be equivalent to
>      special='VOID'. On the other hand, at the end of 2.4, the text
>      says that an empty <one-of> is allowed.

We decided to stick with one or more alternatives and apply appropriate
constraints in the Schema and DTD.  This allows us to maintain the
ABNF/XML semantic equivalence.

> 13) 2.5, special case of <0> or <0-0>: Change the explanation to say
>      that this is the same as special='NULL'.

Changed.

> 14) There seems to be no need for both {!{  }!} and backslash escaping.
>      Please use only one mechanism, preferably backslash escaping.

Backslash escaping was removed.  Tags may be delimited by either
{...} or {!{...}!} with no escaping within.  

Rationale: With the plans for a script-like language to be used within 
tags (ECMAScript or similar) and with the likelihood of common use of
curly braces within the tags it was felt that backslash escaping would 
be more awkward and error-prone.

> 15) 3., second paragraph: 'the rulename resolution specification'
>      Is this a separate spec, or part of this spec?

Clarified.  A rule reference is merely a URI and "rulename resolution"
was a clumsy was of stating how to resolve the URI.

> 16) 3.2 Scoping: The scoping rules are explained by reference to Java.
>      However, while there are very good reasons for data hiding and
>      therefore to have a default of 'private' in programming languages
>      such as Java, I can see absolutely no need for data hiding in
>      the case of speech grammar rules. There should be a better
>      explanation for why the default is 'private', or the default
>      should be changed to 'public' to make it easier to reuse rules.
>      Also, the rule that a private root can not be referenced by
>      name seems unmotivated.

The working group feels that there are very solid reasons to have
data hiding in grammars for much the same reasons as programming
languages.

For example, a typical date grammar may have dozens of rules to 
create a powerful but not over-general syntax for natural language
dates.  Of those rules only one or a handful would be useful to a
developer of a parent grammar that needs to incorporate a date.

By defining internal working rules as private and having the
reference check constrained by the grammar processor there is
significant reduction in the chances of misuse of a rule.

This feature is important for the creation of reusable grammar libraries.

> 17) The choice of 'ABNF' as a magic number seems to be much to
>      general. (see above for application/grammar+xml). Similar
>      considerations apply to the chosen public identifier (and the
>      namespace), as well as the use of the term 'XML Grammar'
>      in the place of 'XML Speach Recognition Grammar'

We've changed the media type application to "application/srgs+xml"
or "application/srgs" for ABNF.  The RFC has not yet been submitted
(should be very shortly).

The group felt that "#ABNF" was sufficient as a magic number.

Loose use of the term "XML Grammar" has been fixed.

> 18) 4.1.4: I think it would be a good idea to change
>      <grammar root="rulename" ...> to <grammar root="#rulename" ...>

The WG decided to stay with the definition of root as an IDREF
rather than treating it as a URI.  Since an IDREF must provide the
name of an identifier within the same document the root attribute
will be constrained to name a legal rule definition.  This constraint
will be useful for grammar authors as an early check.

> 19) 4.1.5: Please use an URI as the identifier for 'semantic' tags.

In progress.  Considering defining the tag-format attribute as 
type NOTATION in the DTD/Schema.

> 20) 2.2.2 and 4.2: It is a bad idea to have the media type specified
>      with the reference overwrite the media type determined from the
>      actual referenced resource.

This issue has been a hot topic for each of the voice browser WG specs.
The most widely held view is that a media type provided by a grammar
author is more likely to be correct than one provided by a web server
(for example).  This is especially the case (1) because few web browsers
and configured to serve the correct type for grammar documents, and
(2) many content authors do not have control of the web server that 
serves their grammar content.

In cases where the media type conflicts with the resource (not the
http declared type) it is considered an error.

> 21) RDF rather than the html-like ad-hoc <meta> should be used for
>      metadata.

We have adopted the <metadata> element of SMIL 2.0 to encapsulate.
We will keep <meta> in particular for overriding HTTP controls
(expiry times, client-side HTTP controls etc)

> Regards,     Martin.
> 
> #-#-#  Martin J. Du"rst, I18N Activity Lead, World Wide Web Consortium
> #-#-#  mailto:duerst@w3.org   http://www.w3.org/People/D%C3%BCrst

Many thanks,
  Andrew Hunt
  Lead Editor, SRGS
  SpeechWorks International

Received on Thursday, 21 February 2002 22:31:12 UTC