RE: Personal comments on Speech Recognition Grammar Spec last call from Andrew Hunt on 2002-03-18 (www-voice@w3.org from January to March 2002)

From: Andrew Hunt <andrew.hunt@speechworks.com>
Date: Mon, 18 Mar 2002 14:03:52 -0500
To: "Martin Duerst" <duerst@w3.org>, <www-voice@w3.org>
Message-ID: <NEBBIPBPMMJJJOKAKJHNGELJEGAA.andrew.hunt@speechworks.com>
Martin,

As of the last email there were three outstanding issues.  Here is
a summary of status/disposition.

7) Please use XLink syntax: the Voice Browser working group has
revisited the issue and reviewed work on HLink for XHTML and
XForms.  The working group is still of the opinion that introducing 
XLink into the set of voice specifications (VoiceXML, SSML, SRGS...)
would be inappropriate at this time though we would not rule it out 
for future versions.


16) Utility of Scoping: a poll of folks who write grammars 
professionally showed strong interest in maintaining the public/
private scope distinction in the specification.  You were copied
on much of that correspondence so I won't repeat it.  Would it 
helped if the various motivations/uses for scoping were summarized
in the specification itself?


20) It is a bad idea to have the media type specified
    with the reference overwrite the media type determined from the
    actual referenced resource

After your most recent email the working group revisited the issue.
We do plan any change to the specification.  Here is our analysis of 
the issue and the reason for remaining with the status quo.

Your point that neither a resource specified by a URI, nor the bytes
returned when resolving the URI, has a unique type, is well-taken.  This
means for example that a server is perfectly justified to return an HTTP
header indicating a type of, say, text/plain, when the bytes are in fact
also a valid W3C grammar.  In such a case it's perfectly reasonable for
the consumer of the resource to interpret those bytes (in a kind of
casting operation) as some other type than the type indicated by the
server, when the consumer of the resource has some "out-of-band" source
of knowledge about the resource.  The "type" attribute provides a way
for an application developer to provide this "out-of-band" knowledge to
the consumer (voice browser in this case). 

This is useful in the common case where the VoiceXML application
developer is also in control of the bytes that the URI resolves to, but
may not be in control of the type information returned by the server.
This is especially important for new types, where experience shows web
servers are frequently not configured to return the most useful or most
specific type information for resources that conform to the new type. 

There is also precedent in recent W3C recommendations for such a "type"
attribute:  from the SMIL 2.0 Recommendation, Chapter 7
(http://www.w3.org/TR/smil20/extended-media-object.html): 

The
<http://www.w3.org/TR/smil20/extended-media-object.html#adef-media-type>
type attribute value takes precedence over other possible sources of the
media type (for instance, the "Content-type" field in an HTTP exchange,
or the file extension). 


Please let us know if you have further comments on these issues.

Regards,
  Andrew Hunt
  Co-editor SRGS
  SpeechWorks International


> Hello Andrew,
> 
> Many thanks for your careful work and explanations.
> Here a short reply to your comments. I'm looking forward to
> discussing the important issues in Cannes.
> 
> Where I don't follow up, I'm okay with the WG's decisions.
> 
> At 22:29 02/02/21 -0500, Andrew Hunt wrote:
> >Martin,
> >
> >Thank you again for your detailed comments on the SRGS document.
> >We've addressed all the issues you raised.  All but a handful of
> >changes have been applied to the document now available at:
> >
> >   http://www.w3.org/Voice/Group/2002/grammar/grammar-spec-20020212-edit.html
> >   [W3C Member only]
> >
> >For non-W3C members, the Voice Browser is working towards an updated
> >publication of the specification in the coming weeks.
> >
> >Would you kindly let us know if we have not satisfactorily addressed
> >the issues or if you see further issues for us to consider.
> >
> >Regards,
> >   Andrew Hunt
> >   Lead Editor, SRGS
> >
> > > -----Original Message-----
> > > From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On
> > > Behalf Of Martin Duerst
> > > Sent: Tuesday, October 09, 2001 4:39 AM
> > > To: www-voice@w3.org
> > > Subject: Personal comments on Speech Recognition Grammar Spec last call
> > >
> > >
> > > Dear Voice Browser WG,
> > >
> > > Below please find my personal comments to the last call
> > > of your Speech Recognition Grammar WD of August 20.
> > >
> > > Please note that I am not subscribed to www-voice@w3.org,
> > > so please make sure you include my address in responses.
> 
> > > 3) '1.4 Semantic Interpretation': The term 'semantics' is heavily
> > >     used (and misused) in various contexts. The use here bears
> > >     a serious risk of confusion with the Semantic Web. It is
> > >     difficult to understand what exactly this 'semantic interpretation'
> > >     is supposed to be, but at the moment, it looks mostly like
> > >     events fired upon detection of some input, with associated
> > >     scripts. In that case, it would be much better to use the
> > >     events/script terminology and maybe also the syntax.
> >
> >The group recognizes that there are many uses of the term semantics.
> >Since "semantic interpretation" is the correct term of art in the speech
> >research and commercial world we remain with the term.  However,
> >to clarify the domain-specific meaning Section 1.4 has been rewritten
> >to expound upon the meaning of semantics in the context of speech
> >recognition.
> 
> This is most probably okay. I'm currently offline, but I'll
> have a look at it.
> 
> 
> > > 7) Please use XLink syntax (xlink:href) instead of the 'uri'
> > >     attribute.
> >
> >The voice WG faces a similar issue with XLink to XHTML.  VoiceXML,
> >in particular, has several elements with multiple URIs that cannot
> >be handled without changing document structure -- e.g. creating
> >sub-elements.
> >
> >With work ongoing considering a mechanism to permit URI identification
> >through a Schema we deferred this changed.
> >
> >See Dave Raggett's analysis of Jan 31 2002 at:
> >
> >http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jan/0109.html
> 
> I understand that in some cases, xlink:href cannot be used
> directly. However, that's not an argument against using it
> when it's possible. I do not remember Speech Recognition Grammar
> to have cases with multiple URIs. So there is no reason not
> to use XLink.
> 
> 
> > > 16) 3.2 Scoping: The scoping rules are explained by reference to Java.
> > >      However, while there are very good reasons for data hiding and
> > >      therefore to have a default of 'private' in programming languages
> > >      such as Java, I can see absolutely no need for data hiding in
> > >      the case of speech grammar rules. There should be a better
> > >      explanation for why the default is 'private', or the default
> > >      should be changed to 'public' to make it easier to reuse rules.
> > >      Also, the rule that a private root can not be referenced by
> > >      name seems unmotivated.
> >
> >The working group feels that there are very solid reasons to have
> >data hiding in grammars for much the same reasons as programming
> >languages.
> >
> >For example, a typical date grammar may have dozens of rules to
> >create a powerful but not over-general syntax for natural language
> >dates.  Of those rules only one or a handful would be useful to a
> >developer of a parent grammar that needs to incorporate a date.
> >
> >By defining internal working rules as private and having the
> >reference check constrained by the grammar processor there is
> >significant reduction in the chances of misuse of a rule.
> >
> >This feature is important for the creation of reusable grammar libraries.
> 
> Sorry, but I'm still not convinced. Data hiding in a programming
> language is very important to make sure that the implementation
> can be changed without affecting usage. Is this the same for
> speech grammars? If yes, please say so. If not, there is really
> no reason for having 'private' be the default.
> 
> 
> > > 19) 4.1.5: Please use an URI as the identifier for 'semantic' tags.
> >
> >In progress.  Considering defining the tag-format attribute as
> >type NOTATION in the DTD/Schema.
> 
> Notation is not the most popular feature in XML, I'm not sure how
> useful it is.
> 
> 
> > > 20) 2.2.2 and 4.2: It is a bad idea to have the media type specified
> > >      with the reference overwrite the media type determined from the
> > >      actual referenced resource.
> >
> >This issue has been a hot topic for each of the voice browser WG specs.
> >The most widely held view is that a media type provided by a grammar
> >author is more likely to be correct than one provided by a web server
> >(for example).  This is especially the case (1) because few web browsers
> >and configured to serve the correct type for grammar documents, and
> >(2) many content authors do not have control of the web server that
> >serves their grammar content.
> >
> >In cases where the media type conflicts with the resource (not the
> >http declared type) it is considered an error.
> 
> Sorry, but this is not the way the Web is defined to work.
> First, a resource doesn't have a media type (e.g. what's the media
> type of the resource http://www.w3.org/Icons/w3c_home? It's
> available as gif, as png, and a svg, as far as I know). Second,
> the bunch of bytes returned when resolving a resource also don't
> have a media type. For example, what's the media type of the
> following file:
> 
> <html>
> This is my first HTML page: Hello World!
> </html>
> 
> It looks quite a bit like HTML. But strictly speaking, it isn't.
> (no title).
> And what if the server wants to make sure that this is displayed
> as text/plain? So trying to rely on guessing the type of the
> bytes sent over is very much guaranteed to lead to strong
> interoperability problems.
> 
> Also, I know the argument about server configuration quite well,
> but it's much more severe for 'charset' information. For a new
> media type with a new extension, it's not really that much of a
> problem.
> 
> Also, the solution you currently choose would be in conflict with
> what other W3C specs have done. The most obvious example is SMIL,
> where the <audio> or <text> or <video>,... is purely advisory.
> If necessary, this may have to be looked at at a higher level than
> the WG.
> 
> 
> Regards,   Martin.
> 
> 
>
Received on Monday, 18 March 2002 14:03:58 UTC