RE: Personal comments on Speech Recognition Grammar Spec last call

Martin, Al,

It has taken a month to get back to making an update to the media type
language based on your most recent comments.  I'm sorry for the delay.  
Below in this message is a revision to media type language that I hope 
addresses your concerns.  The relevant paragraphs are marked with **.  
I have also included a pointwise summary wrt your previous emails.

I sincerely hope that the spec change included below addresses the
points that you have raised and that it is satisfactory for progress
to recommendation track.

Again, thank you for the input on the speech recognition grammar spec
and your responsiveness to the discussion.

--Andrew Hunt
  Co-editor SRGS
  SpeechWorks


--RESPONSE TO PREVIOUS REMARKS--

2002/03/21: http://lists.w3.org/Archives/Public/www-voice/2002JanMar/0126.html
> At the minimum, I would like to see an explanation of the problems
> with using the type attribute on the link (a very clear example would
> be that somebody has some linked speech grammar files in ABNF, and
> converts them to the XML notation. All the type attributes in the
> links to that grammar fragment will have to be updated, which may
> be a real pain), and a commitment of the WG to (some of) the
> proposals for getting things improved on the server side.

Use of type is suggested only as a last resort and the format
conversion issue is covered.  Both are in the third paragraph
marked with **.

We have raised this issue with Philipp Hoschka to seek guidance on
next steps within the W3C.  At this point I cannot make commitments
on behalf of the working group beyond escalating our experience on
this topic.

2002/03/20: http://lists.w3.org/Archives/Public/www-voice/2002JanMar/0119.html
>- Making sure that the MIME type is actually registered.
>   It's easier to convince a webmaster to add something to a config
>   file if an author can point to a full registration, rather than
>   just some 'customarily used' mime type.

Media type requests have been published by IETF and are
referenced from the specification:
  http://www.ietf.org/internet-drafts/draft-porter-srgs-media-reg-00.txt
  http://www.ietf.org/internet-drafts/draft-porter-srgsxml-media-reg-00.txt
The applications have not been accepted yet.

2002/03/20: http://lists.w3.org/Archives/Public/www-voice/2002JanMar/0119.html
>- Saying clearly in the spec that Web server configurations should
>   be updated to send the right mime type.

Spec now states that type should be used only as a last resort when the 
web server cannot be configured to return the correct media type.

>- Using contacts of the WG (and the W3C overall if necessary) to
>   make sure the configurations in the newest releases of the servers
>   are up to date.

The WG will be in a position to action on this point when the media
type have been granted.

>- Providing help on a public web page on how to set up various servers
>   (see e.g. <http://www.w3.org/International/O-HTTP-charset.html>
>     http://www.w3.org/International/O-HTTP-charset.html for
>    an example; I'm glad to accept suggestions for improvement).

The WG was reluctant to get into this arena.

>- Ideally, combining the latent frustration about this issue in
>   various WGs and Activities to some coordinated effort.

We have raised this issue to Philipp Hoschka to seek guidance
on next steps within the W3C.


---DRAFT REVISED TEXT---

2.2.2 External Reference by URI 

References to rules defined in other grammars are legal under the conditions 
defined in Section 3. The external reference must identify the external grammar 
by URI and may identify a specific rule within that grammar. If the fragment 
identifier that would indicate a rulename is omitted, then the reference targets 
the root rule of the external grammar. 

A URI reference is illegal if the referring document and referenced document 
have different modes. For instance, it is illegal to reference a "dtmf" grammar 
from a "voice" grammar. (See Section 4.6 for additional detail on modes.) 

** A URI reference may be accompanied by a media type that indicates the content type 
of the resource identified by the URI. When specified, the type value takes 
precedence over other possible sources of the media type (for instance, the 
"Content-type" field in an HTTP exchange, or the file extension). If present, 
the media type is binding and cannot be ignored when parsing the referenced URI. 

** When the content represented by a URI is available in many data formats, a 
grammar processor may use the type to influence which of the multiple formats 
is used. For instance, on a server implementing HTTP content negotiation, the 
processor may use the type to order the preferences in the negotiation. 

** Informative: use of the type attribute should be considered a last resort. For 
instance, the type may be appropriate when a grammar is fetched via HTTP but 
(1) a web server cannot be configured to indicate the correct media type, and 
(2) the grammar processor is unable to automatically detect the media type. In 
the event that a grammar is transformed to another form (e.g. ABNF Form to XML 
Form) then any type attribute on a reference to that grammar also must be modified. 


> -----Original Message-----
> From: Martin Duerst [mailto:duerst@w3.org]
> Sent: Monday, April 01, 2002 10:21 PM
> To: Al Gilman; andrew.hunt@speechworks.com; www-voice@w3.org
> Subject: RE: Personal comments on Speech Recognition Grammar Spec last
> call
> 
> 
> Hello Al,
> 
> After taking quite some time to think about it again,
> I think I can very much agree with your position.
> 
> I very much hope that the Speech Recognition Grammar Spec
> can be changed in that way, and would otherwise not be
> satisfied with the resolution.
> 
> It took me quite a while to think about potential security
> issues (there have been several recent security problems
> where (mime) typing was involved). At the moment, my guess
> is that there is not too much of a problem. The first point
> would be that speech grammars are not security-relevant
> (maybe a grammar with some recursive rules could create
> an infinite loop and therefore a denial of service attack
> on a bad implementation). But maybe I'm wrong here and
> some components of the grammar could lead to execution
> of some code. The second point is of course that a speach
> grammar will be used when a speech grammar is referenced.
> There is no general link functionality such as: When you
> get here, display the referenced document.
> So I think it should be okay.
> 
> Regards,    Martin.
> 
> At 16:16 02/03/21 -0500, Al Gilman wrote:
> >Sorry to be a space cadet, but I want to reverse what I said a bit.
> >
> >Rather than say "the type indicated in the reference rules, when present" 
> >better to say "In the case that the actual resource recovered bears an 
> >indication of a type not suitable for processing, the type indicated in 
> >the reference may be used to attempt a recovery from this error."
> >
> >[more below]
> >
> >At 08:47 PM 2002-03-20 , Martin Duerst wrote:
> > >Hello Andrew,
> > >
> > >I'm sorry to bother you again. Based on Al's comments, I had a look
> > >at your mail again, and found some very basic unclarity.
> > >
> > >At 14:03 02/03/18 -0500, Andrew Hunt wrote:
> > >>Martin,
> > >>
> > >>As of the last email there were three outstanding issues.  Here is
> > >>a summary of status/disposition.
> > >
> > >>20) It is a bad idea to have the media type specified
> > >>     with the reference overwrite the media type determined from the
> > >>     actual referenced resource
> > >>
> > >>After your most recent email the working group revisited the issue.
> > >>We do plan any change to the specification.
> > >
> > >Do you plan some change, or do you not plan any change?
> > >
> > >At the minimum, I would like to see an explanation of the problems
> > >with using the type attribute on the link (a very clear example would
> > >be that somebody has some linked speech grammar files in ABNF, and
> > >converts them to the XML notation. All the type attributes in the
> > >links to that grammar fragment will have to be updated, which may
> > >be a real pain), and a commitment of the WG to (some of) the
> > >proposals for getting things improved on the server side.
> > >
> > >Regards,   Martin.
> > >
> > >
> > >>Here is our analysis of
> > >>the issue and the reason for remaining with the status quo.
> > >>
> > >>Your point that neither a resource specified by a URI, nor the bytes
> > >>returned when resolving the URI, has a unique type, is well-taken.  This
> > >>means for example that a server is perfectly justified to return an HTTP
> > >>header indicating a type of, say, text/plain, when the bytes are in fact
> > >>also a valid W3C grammar.  In such a case it's perfectly reasonable for
> > >>the consumer of the resource to interpret those bytes (in a kind of
> > >>casting operation) as some other type than the type indicated by the
> > >>server, when the consumer of the resource has some "out-of-band" source
> > >>of knowledge about the resource.  The "type" attribute provides a way
> > >>for an application developer to provide this "out-of-band" knowledge to
> > >>the consumer (voice browser in this case).
> > >>
> > >>This is useful in the common case where the VoiceXML application
> > >>developer is also in control of the bytes that the URI resolves to, but
> > >>may not be in control of the type information returned by the server.
> > >>This is especially important for new types, where experience shows web
> > >>servers are frequently not configured to return the most useful or most
> > >>specific type information for resources that conform to the new type.
> > >>
> >
> >AG::
> >
> >In the case of a conflict between the type that the application expects 
> >from reading the referring grammar and the type that the transport asserts 
> >for the entity transported, the processor does not have to go with just 
> >one or the other.  Either one could be wrong.
> >
> >If one type reflects a potential success and the other type reflects a 
> >sure failure, then the processor could take the optimistic route, 
> >interpret the recovered resource representation in accordance with that 
> >type and if the recognition process (parse, etc.) succeeds, go with it.
> >
> >
> >For SGRS we have the added complexity that applicable grammars come in two 
> >equivalent forms which are expected to have different MIME types.  Making 
> >the type indication in the reference override the actual type of the 
> >grammar sent will force errors when the type indication in the reference 
> >is ABNF and the data returned are in XML, for example.  Should this be an 
> >error?
> >
> >Would it possible for the processor to make the determination as to 
> >whether XML or ABNF grammars are acceptable at this point and not the 
> >referring grammar document?
> >
> >Al
> >
> > >>There is also precedent in recent W3C recommendations for such a "type"
> > >>attribute:  from the SMIL 2.0 Recommendation, Chapter 7
> > >>(http://www.w3.org/TR/smil20/extended-media-object.html):
> > >>
> > >>The
> > >><http://www.w3.org/TR/smil20/extended-media-object.html#adef-media-type>
> > >>type attribute value takes precedence over other possible sources of the
> > >>media type (for instance, the "Content-type" field in an HTTP exchange,
> > >>or the file extension).
> > >>
> > >>
> > >>Please let us know if you have further comments on these issues.
> > >>
> > >>Regards,
> > >>   Andrew Hunt
> > >>   Co-editor SRGS
> > >>   SpeechWorks International
> > >
> 

Received on Thursday, 9 May 2002 17:57:49 UTC