Re: builtin grammars from Olli Pettay on 2011-10-19 (public-xg-htmlspeech@w3.org from October 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Wed, 19 Oct 2011 12:18:36 +0300
To: Michael Bodell <mbodell@microsoft.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4E9E95EC.5090507@helsinki.fi>
On 10/19/2011 04:17 AM, Michael Bodell wrote:
> We’ve talked in a few different calls about how to allow/extend builtin
> grammars. Here’s my proposal for an extensible mechanism that should
> support all the cases we care about. Note while this works very well for
> the <reco> being linked to corresponding recoable elements, it isn’t a
> requirement and works as a scheme in the general case as well.
>
> The basic pattern is:
>
> builtin:<tag name|command>[?<attributes>[&<attributes>]*]?
>
> Where the first ? is a character and the last is a regexp 0 or 1. The
> attributes are extensible but include standard HTML5 attributes as well
> as the flag we discussed a few weeks back that Glen suggested about
> filter=noOffensiveWords.
>
> For the tag name this would look like:
>
> builtin:input?type=text
>
> builtin:input?type=search&filter=noOffensiveWords
>
> // note the pattern is [0-9][A-Z]{3} but with the various [, ], {, } all
> url escaped
>
> builtin:input?type=text&pattern=%5B0-9%5D%5BA-Z%5D%7B3%7D

So, is this really something speech engines can support?

And how do they handle patterns.
If pattern is "Hello", is that interpret as word 'Hello', or
separate characters 'H', 'e' 'l', 'l','o' ?


>
> builtin:input?type=date&max=1979-12-31
>
> builtin:input?type=number&min=1&value=1
>
> builtin:input?type=range&min=0&max=1&step=0.00392156863

Again, how would speech engines support this? What kind of input is 
expected?
User saying 0.2 would be no-match but 0.00784313726 would be ok?


>
> builtin:textarea
>
> builtin:textarea?placeholder=text+message&filter=noOffensiveWords
What does placeholder=text+message mean?

>
> builtin:button?type=submit&value=press+me
What does this mean?


Why do we need input/textarea/button ? Why not just builtin:type=number etc.




>
> We also had earlier discussed two different types of recommended default
> builtins that wouldn’t be tied to a tag (the command in my regexp at the
> top):
>
> builtin:dictation
>
> builtin:websearch
>
> both of which could have additional parameter values like
>
> builtin:websearch?filter=noOffensiveWords
>
> With respect to who implements the builtin grammars (I.e., is this part
> of the WebAPI and the user agent is responsible from mapping from
> builtin URL to some SRGS or proprietary format to submit to the speech
> service, or should these named builtin types be available at the
> service) I think there are a lot of advantages for saying the service is
> responsible.
The Web facing part should be standard. If we say that 
builtin:input?type=text&pattern=%5B0-9%5D%5BA-Z%5D%7B3%7D should be 
supported, then it should be supported the same way in all the speech 
services. I mean,


> Obviously for default cases where the UA and service might
> be one and the same this doesn’t matter, but for cases where it isn’t
> there are these advantages:
>
> 1.These URI can be used from the middle of other SRGS grammars and
> expected to work. This means you can use number, email, dictation and
> what not as part of a custom grammar that may have your top few commands
> on top of or overlaid on the builtin types.
>
> 2.For the proprietary non-rule based grammars (necessary for things like
> dictation and websearch) you don’t need to have the UA own that, they
> can be part of the evolving recognition service. Also, you don’t need to
> worry about the UA needing to transfer a giant statistical language
> model grammar representing the whole internet web search whenever you
> want to do a Bing or Google voice web search.
>
> 3.The speech service likely has the right expertise to train and tune
> the grammars in all the different languages. As requiring the UA to know
> how to build an accurate builtin grammar even for something relatively
> simple like number or date in all languages is unrealistic.
>
Received on Wednesday, 19 October 2011 09:19:19 UTC