RE: Personal comments on Speech Recognition Grammar Spec last call from Martin Duerst on 2002-02-24 (www-voice@w3.org from January to March 2002)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 24 Feb 2002 19:15:59 +0900
To: <andrew.hunt@speechworks.com>, <www-voice@w3.org>
Message-Id: <4.2.0.58.J.20020223191011.0401ad08@localhost>
Hello Andrew,

Many thanks for your careful work and explanations.
Here a short reply to your comments. I'm looking forward to
discussing the important issues in Cannes.

Where I don't follow up, I'm okay with the WG's decisions.

At 22:29 02/02/21 -0500, Andrew Hunt wrote:
>Martin,
>
>Thank you again for your detailed comments on the SRGS document.
>We've addressed all the issues you raised.  All but a handful of
>changes have been applied to the document now available at:
>
>   http://www.w3.org/Voice/Group/2002/grammar/grammar-spec-20020212-edit.html
>   [W3C Member only]
>
>For non-W3C members, the Voice Browser is working towards an updated
>publication of the specification in the coming weeks.
>
>Would you kindly let us know if we have not satisfactorily addressed
>the issues or if you see further issues for us to consider.
>
>Regards,
>   Andrew Hunt
>   Lead Editor, SRGS
>
> > -----Original Message-----
> > From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On
> > Behalf Of Martin Duerst
> > Sent: Tuesday, October 09, 2001 4:39 AM
> > To: www-voice@w3.org
> > Subject: Personal comments on Speech Recognition Grammar Spec last call
> >
> >
> > Dear Voice Browser WG,
> >
> > Below please find my personal comments to the last call
> > of your Speech Recognition Grammar WD of August 20.
> >
> > Please note that I am not subscribed to www-voice@w3.org,
> > so please make sure you include my address in responses.

> > 3) '1.4 Semantic Interpretation': The term 'semantics' is heavily
> >     used (and misused) in various contexts. The use here bears
> >     a serious risk of confusion with the Semantic Web. It is
> >     difficult to understand what exactly this 'semantic interpretation'
> >     is supposed to be, but at the moment, it looks mostly like
> >     events fired upon detection of some input, with associated
> >     scripts. In that case, it would be much better to use the
> >     events/script terminology and maybe also the syntax.
>
>The group recognizes that there are many uses of the term semantics.
>Since "semantic interpretation" is the correct term of art in the speech
>research and commercial world we remain with the term.  However,
>to clarify the domain-specific meaning Section 1.4 has been rewritten
>to expound upon the meaning of semantics in the context of speech
>recognition.

This is most probably okay. I'm currently offline, but I'll
have a look at it.


> > 7) Please use XLink syntax (xlink:href) instead of the 'uri'
> >     attribute.
>
>The voice WG faces a similar issue with XLink to XHTML.  VoiceXML,
>in particular, has several elements with multiple URIs that cannot
>be handled without changing document structure -- e.g. creating
>sub-elements.
>
>With work ongoing considering a mechanism to permit URI identification
>through a Schema we deferred this changed.
>
>See Dave Raggett's analysis of Jan 31 2002 at:
>
>http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jan/0109.html

I understand that in some cases, xlink:href cannot be used
directly. However, that's not an argument against using it
when it's possible. I do not remember Speech Recognition Grammar
to have cases with multiple URIs. So there is no reason not
to use XLink.


> > 16) 3.2 Scoping: The scoping rules are explained by reference to Java.
> >      However, while there are very good reasons for data hiding and
> >      therefore to have a default of 'private' in programming languages
> >      such as Java, I can see absolutely no need for data hiding in
> >      the case of speech grammar rules. There should be a better
> >      explanation for why the default is 'private', or the default
> >      should be changed to 'public' to make it easier to reuse rules.
> >      Also, the rule that a private root can not be referenced by
> >      name seems unmotivated.
>
>The working group feels that there are very solid reasons to have
>data hiding in grammars for much the same reasons as programming
>languages.
>
>For example, a typical date grammar may have dozens of rules to
>create a powerful but not over-general syntax for natural language
>dates.  Of those rules only one or a handful would be useful to a
>developer of a parent grammar that needs to incorporate a date.
>
>By defining internal working rules as private and having the
>reference check constrained by the grammar processor there is
>significant reduction in the chances of misuse of a rule.
>
>This feature is important for the creation of reusable grammar libraries.

Sorry, but I'm still not convinced. Data hiding in a programming
language is very important to make sure that the implementation
can be changed without affecting usage. Is this the same for
speech grammars? If yes, please say so. If not, there is really
no reason for having 'private' be the default.


> > 19) 4.1.5: Please use an URI as the identifier for 'semantic' tags.
>
>In progress.  Considering defining the tag-format attribute as
>type NOTATION in the DTD/Schema.

Notation is not the most popular feature in XML, I'm not sure how
useful it is.


> > 20) 2.2.2 and 4.2: It is a bad idea to have the media type specified
> >      with the reference overwrite the media type determined from the
> >      actual referenced resource.
>
>This issue has been a hot topic for each of the voice browser WG specs.
>The most widely held view is that a media type provided by a grammar
>author is more likely to be correct than one provided by a web server
>(for example).  This is especially the case (1) because few web browsers
>and configured to serve the correct type for grammar documents, and
>(2) many content authors do not have control of the web server that
>serves their grammar content.
>
>In cases where the media type conflicts with the resource (not the
>http declared type) it is considered an error.

Sorry, but this is not the way the Web is defined to work.
First, a resource doesn't have a media type (e.g. what's the media
type of the resource http://www.w3.org/Icons/w3c_home? It's
available as gif, as png, and a svg, as far as I know). Second,
the bunch of bytes returned when resolving a resource also don't
have a media type. For example, what's the media type of the
following file:

<html>
This is my first HTML page: Hello World!
</html>

It looks quite a bit like HTML. But strictly speaking, it isn't.
(no title).
And what if the server wants to make sure that this is displayed
as text/plain? So trying to rely on guessing the type of the
bytes sent over is very much guaranteed to lead to strong
interoperability problems.

Also, I know the argument about server configuration quite well,
but it's much more severe for 'charset' information. For a new
media type with a new extension, it's not really that much of a
problem.

Also, the solution you currently choose would be in conflict with
what other W3C specs have done. The most obvious example is SMIL,
where the <audio> or <text> or <video>,... is purely advisory.
If necessary, this may have to be looked at at a higher level than
the WG.


Regards,   Martin.
Received on Monday, 25 February 2002 01:17:39 UTC