- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Thu, 15 Sep 2011 10:53:23 -0400
- To: "'Satish S'" <satish@google.com>, <olli@pettay.fi>
- Cc: "'Michael Bodell'" <mbodell@microsoft.com>, <public-xg-htmlspeech@w3.org>
- Message-ID: <079001cc73b7$389881f0$a9c985d0$@conversational-technologies.com>
I don't know if this proposal is simpler, but there are aspects of it that might be convenient and help in debugging. 1. Setting up the SpeechInputRequest seems to involve about the same amount of complexity in both proposals. Let's say that the developer has a set of five grammars that they want to use in the application. In the current proposal, the developer would call the "addGrammar" method five times to add it to the SpeechInputRequest. In Satish's proposal, the developer would create five Grammar objects and add them all at once when the SpeechInputRequest is created, which seems like roughly the same amount of complexity. In Satish's proposal there is an additional Grammar object that developers have to learn about and manage, but it's not a very complex object. 2. However, later on, if the developer wants to add or remove one or more grammars, in the current proposal you would call addGrammar() or disableGrammar(), possibly multiple times, if the change involved multiple grammars. In Satish's proposal, if I understand it, I think you would just set the grammars attribute of the SpeechInputRequest with the desired set of grammars. I think this would be easier to debug because the full set of active grammars would be explicitly set, as opposed to having to trace back through the sequence of "addGrammar" and "disableGrammar" calls to figure out if something was active that shouldn't have been and vice versa. On the other hand it might be annoying to have to respecify the full list of N grammars every time you just want to change one of them. 3. If the developer has a single grammar that they want to use in different SpeechInputRequests, it would be convenient to be able to reuse the same Grammar object several times for different SpeechInputRequests. 4. I think if the developer wants a modal grammar, they would just make it the only grammar in the sequence of active grammars. 5. One question I have, though, is what happens if the actual grammar at the src uri changes after the Grammar object is created? If the developer wants to insure that they're using a completely up to date grammar do they have always have to remember to create a new Grammar object right before they add it to the SpeechInputRequest? From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish S Sent: Tuesday, September 13, 2011 6:58 AM To: olli@pettay.fi Cc: Michael Bodell; public-xg-htmlspeech@w3.org Subject: Re: [agenda] 8 September 2011 I also notice we started discussing Section 7 and covered some parts like grammars. Currently we have 4 methods for manipulating the list of grammars which seems complicated. A simpler way would be to define a Grammar interface and have a sequence of them as an atttibute of SpeechInputRequest [Constructor] interface SpeechInputGrammar { attribute DOMString src; attribute float weight; attribute boolean modal; (Does this make sense or can the webapp just remove the others in the grammar list below?) } interface SpeechInputRequest { ... attribute sequence<SpeechInputGrammar> grammars; ... } Cheers Satish On Tue, Sep 13, 2011 at 11:48 AM, Satish S <satish@google.com> wrote: Sorry I couldn't attend the call last week as I was on leave. I see in the minutes that Olli's first point was discussed briefly, about automatic binding to various types of html elements. But it doesn't look like we had a satisfactory conclusion. I am wondering if we really need automatic binding to existing elements or can <reco> just be a standalone new UI element. The main reason I support a <reco> element is for user initiated recognition (without automatically throwing up security prompts on page load). This doesn't require automatic binding and if the <reco> element was just aimed at getting user consent, start recognition and return results to the JS event handler that would support the use case of user initiated recognition. Cheers Satish On Thu, Sep 8, 2011 at 11:03 AM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote: Few comments. "Some elements are catagorized as recoable elements. These are elements that can be associated with a reco element: button input (if the type attribute is not in the Hidden state) keygen meter output progress select textarea" This is not enough, and not precise enough. How should we handle contentEditable? I'm also pretty sure we don't want to set the *value* of <input type="checkbox"> but the state etc. Also, why not set the value of <input type="hidden"> ? (These are the kind of problems which make the API inconsistent and why I wouldn't have the automatic value binding to HTML elements.) "The reco element's exact default presentation and behavior, in particular what its activation behavior" We need to still figure out some permission API. User must give permission in some way, and Web app needs to probably know about user's decision so that if user decides not to ever give permission, web app can hide the UI related to speech handling. " might be and what implicit grammars might be defined, if any, is unspecified and user agent specific. The activation behavior of a reco element for events targetted at interactive content descendants of a reco element, and any descendants of those interactive content descendants, MUST be to do nothing. When a reco element with a reco control is activated and gets a reco result, the default action of the recognition event SHOULD be to set the value of the reco control to the top n-best interpretation of the recognition (in the case of single recognition) or an appended latest top n-best interpretation (in the case of dictation mode with multiple inputs)." I don't understand the "SHOULD" part. If we want to support automatic value binding, UA implementing the API must set the value of reco control, if there is one. On 09/08/2011 11:48 AM, Michael Bodell wrote: A number of folks may be out, but it will be good to get through the rest of the API document on the call. I've attached a new version of the file that incorporates most of the information from the last Web API call. Last time we only just started on Section 6. We should start there and finish the document and then continue the discussion on results formats. As a reminder the plan is that we get more concrete and knock out holes in the Web API in this document, then rationalize it with the protocol work, and then fold both in to the final group report. We are mostly still on that first step. Here is my summary of what has changed from the last document, based on the minutes of the last meeting: Done: changes to section 3: remove bind and unbind changes to section 3: add a method to createSpeech[Input|Output]Request on the SpeechRequest interface changes to section 3: change the type enum to be a bitfield so TTS is 1, ASR is 2, and don't need TTSASR if you have TTS | ASR issue to section 3: remove the state possible issue to 3: we can have multiple of these, should state that, they go away with garbage collection, only issue is how long does the service stay open after a query, do we need some explicit close/reattach (bind/unbind) or do we just not care... changes to section 4: need Query to be more specific if this is on Window changes to section 4: merge filter and options (the criteria is in the option, probably as a flat list) issues to section 4: query needs to be async changes to section 4: add successCallBack and failureCallBack to the specific speechQuery function changes to section 5: Need better definition of "recoable elements", probably listing all such elements issue on section 5: Need a way to get at the SpeechRequest associated with the reco element issue on section 5: have a SpeechInputRequest attribute of reco that is the tied request... this could have the default UA service or a service based on URI attribute. From scripting if you get a new SIR you can set the attribute to associate the new SIR with this reco element issue on section 6: same kind of idea with section 5, with a SpeechOutputRequest instead of SpeechInputRequest issue for section 6 (and generally): Link to the definitions in HTML 5 (for HTMLMediaElement, but also for "potentially playing", etc.) Notes about: issue on section 3 or 4: possibly need a way to check if you have permission to do recognition, method on Service (or on Query?) issues to section 4: need a function to return the service based on the URI for the service changes to section 4: filter on required languages and required service type and possibly other things... changes to section 4: possible filter on audio codecs as well issues to section 4: need a way to do authorization, possibly as a name/password options in query options, possibly as authorization for an authorization header, possibly just as proprietary stuff on the URI Not done/no changes: issue to section 4: maybe have constructor and set things? changes to section 5: possible remove form attribute or control... possibly not since this really does tie it to the element properly section 5: htmlFor is definitely fine, later discussion sounds like control is fine too issue to section 5: need some sentence about security model, but that probably ties to request object and not the reco element possible issue for section 6: think about CSS audio last call and if it effects section 6 ________________________________________ From: public-xg-htmlspeech-request@w3.org [public-xg-htmlspeech-request@w3.org] on behalf of Young, Milan [Milan.Young@nuance.com] Sent: Wednesday, September 07, 2011 4:03 PM To: Dan Burnett; public-xg-htmlspeech@w3.org Subject: RE: [agenda] 8 September 2011 I also need to send my regrets for this week. -----Original Message----- From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Dan Burnett Sent: Wednesday, September 07, 2011 11:45 AM To: public-xg-htmlspeech@w3.org Subject: [agenda] 8 September 2011 We will have a teleconference this week as planned. Agenda: 1. Web API discussion Since neither Bjorn nor I expects to be able to attend this week's call, we will not be going through any more issue topics in the draft report this week. I propose that we have the following meetings be focused on the listed topics: 15 September: Web API discussion for 60 minutes, then Protocol discussion or outstanding topics, whichever is most needed. 22 September: Protocol wrap up for 30 minutes, then Web API discussion 29 September: Web API wrap up -- dan ============== ==Telecon info == ============== Date: Thursday, 8 September 2011 Time: Noon (New York), 1800 (Central Europe), 0100 (Tokyo) Duration: 90 minutes US telephone number: +1.617.761.6200 France telephone number: +33.4.26.46.79.03 UK telephone number: +44.203.318.0479 Conference code: 48657# (HTMLS#) Info on using Zakim: http://www.w3.org/2002/01/UsingZakim Irc channel: #htmlspeech =================== = Recent minute-takers = =================== 1 September: Glen Shires 4 August: Robert Brown 28 July: Milan Young 7 July: Dan Burnett 30 June: Debbie Dahl 16 June: Patrick Ehlen 9 June: Raj Tumuluri 2 June: Michael Johnston 19 May: Michael Bodell 12 May: Dan Druta 5 May: Charles Hemphill 28 April: Robert Brown 21 April: Olli Pettay 14 April: Milan Young 7 April: Debbie Dahl 17 March: Dan Burnett 17 February: Bjorn Bringert 16 December: Robert Brown 9 December: Dan Druta 2 December: Raj Tumuluri 18 November: Milan Young, Dan Burnett 11 November: Debbie Dahl
Received on Thursday, 15 September 2011 14:54:26 UTC