Re: [speechXG] parameter setting for the recognition object from Bjorn Bringert on 2011-08-02 (public-xg-htmlspeech@w3.org from August 2011)

From: Bjorn Bringert <bringert@google.com>
Date: Mon, 1 Aug 2011 17:30:47 -0700
To: "Young, Milan" <Milan.Young@nuance.com>
Cc: Deborah Dahl <dahl@conversational-technologies.com>, HTML Speech XG <public-xg-htmlspeech@w3.org>
Message-ID: <CAJtyJaViUtcJRTbM99PjjEi4bKzEs1FKJvfGHt=4LKMvqCn_dw@mail.gmail.com>
These things seem more like events than parameters to me, so a
different name might be in order. Also, do web apps need to be able to
get their value?

/Bjorn

On Mon, Aug 1, 2011 at 4:39 PM, Young, Milan <Milan.Young@nuance.com> wrote:
> I provided you with two examples earlier (LOG and STATE), but perhaps the biggest use case is forwarding non-speech input events.  Consider a multi-modal dialog where the user is booking a flight.  They speak the origin and destination airports, but decide to click on the date range because they are not sure where the weekends line up on the calendar.  The speech service is also listening for date input and needs to be informed that this information has already been collected.
>
> I propose we split API parameter settings into two groups:
>
>  * Request parameters.  This class of parameter is set/get using standard EMCA syntax on the recognition and synthesis handles.  The state of all such parameters are applied to a request at the time the request is issued (regardless of how it lines up within a browser loop).  But new parameter settings will never affect previous or ongoing requests.
>
>  * Session parameters.  These parameters are accessed through a pair of functions which hang off the recognition and synthesis handles (suggest setSessionParam() and getSessionParam()).  These functions have immediate effect on the session in which the recognition or synthesis is taking place.
>
>
> Note that the above approach maps directly to the current proposal in the protocol group.  The request parameters are passed as headers on the primary request methods (RECOGNIZE, SPEAK, DEFINE-GRAMMAR, etc.), with a special pair of SET-PARAMETER and GET-PARAMETER methods for passing/receiving "real-time" events.
>
>
> Thanks
>
>
> -----Original Message-----
> From: Bjorn Bringert [mailto:bringert@google.com]
> Sent: Monday, August 01, 2011 1:14 PM
> To: Young, Milan
> Cc: Deborah Dahl; HTML Speech XG
> Subject: Re: [speechXG] parameter setting for the recognition object
>
> Could you provide a list of the events that you think apps need to be able to send?
>
> On Mon, Aug 1, 2011 at 6:47 AM, Young, Milan <Milan.Young@nuance.com> wrote:
>> Are such functions for sending these events part of the existing API?
>>
>>
>> -----Original Message-----
>> From: Bjorn Bringert [mailto:bringert@google.com]
>> Sent: Friday, July 29, 2011 3:33 PM
>> To: Young, Milan
>> Cc: Deborah Dahl; HTML Speech XG
>> Subject: Re: [speechXG] parameter setting for the recognition object
>>
>> I agree that parameters should not be used in the Web app API when you want the semantics of sending a message to the speech recognizer.
>> There should be functions for that.
>>
>> On Fri, Jul 29, 2011 at 10:17 AM, Young, Milan <Milan.Young@nuance.com> wrote:
>>> Another limitation that might come up in this proposal is sending the same event name with different values within the same browser loop.  LOG and STATE are examples of such events that would require that flexibility.
>>>
>>> If we are going to rely on this browser loop model for parameter settings, then perhaps we need to separate parameters from events.  Events are similar in that they have a name and value, but they should be sent immediately when issued.  If that is not possible, then all commands over the recognition and synthesis handles should be buffered and executed in order when the loop terminates.
>>>
>>> Thoughts?
>>>
>>>
>>> -----Original Message-----
>>> From: Bjorn Bringert [mailto:bringert@google.com]
>>> Sent: Friday, July 29, 2011 12:38 PM
>>> To: Young, Milan
>>> Cc: Deborah Dahl; HTML Speech XG
>>> Subject: Re: [speechXG] parameter setting for the recognition object
>>>
>>> I think that for this use case, you should use two separate recognition requests.
>>>
>>> On Fri, Jul 29, 2011 at 9:31 AM, Young, Milan <Milan.Young@nuance.com> wrote:
>>>> Just think of it as an event that needs to be fired before the recognition request is terminated.
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Bjorn Bringert [mailto:bringert@google.com]
>>>> Sent: Friday, July 29, 2011 11:10 AM
>>>> To: Young, Milan
>>>> Cc: Deborah Dahl; HTML Speech XG
>>>> Subject: Re: [speechXG] parameter setting for the recognition object
>>>>
>>>> I don't quite understand what "save the dialog context" means here.
>>>>
>>>> On Fri, Jul 29, 2011 at 8:03 AM, Young, Milan <Milan.Young@nuance.com> wrote:
>>>>> Let's consider an example where the user has been dictating an email.  While the transaction is in progress, they get some other notification in the same  webpage that grabs their attention.  They click a button to save their existing state and focus on the new dialog.  That handler for that button would like to do the following:
>>>>>
>>>>>  1) Set a parameter on the existing recognition to save the dialog context.
>>>>>  2) Stop the ongoing recognition and fetch the last result.
>>>>>  3) Start a new recognition with a new set of grammars and parameters.
>>>>>
>>>>> Ideally the parameter set in #1 is sent immediately.  But it wouldn't be all that bad to buffer it until #2 is executed.  But it certainly needs to take place before #3 starts the fresh dialog.
>>>>>
>>>>> Is this consistent with the browser loop model you are discussing below?
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Bjorn Bringert [mailto:bringert@google.com]
>>>>> Sent: Thursday, July 28, 2011 4:45 PM
>>>>> To: Young, Milan
>>>>> Cc: Deborah Dahl; HTML Speech XG
>>>>> Subject: Re: [speechXG] parameter setting for the recognition
>>>>> object
>>>>>
>>>>> It's not really exiting a thread. It's more like the browser has a single thread with a top-level loop that waits for events and calls JavaScript code to handle them. The execution model is pretty different from normal apps, and there isn't really any multithreading etc.
>>>>>
>>>>> In my mind, we are developing an API for use in web pages, so it should be designed to work in a browser. As such, we don't have to worry about things like multithreading, and can assume that we are running in an event loop.
>>>>>
>>>>> Browser implementors: please correct any misconceptions that I have.
>>>>>
>>>>> /Bjorn
>>>>>
>>>>> On Thu, Jul 28, 2011 at 12:37 PM, Young, Milan <Milan.Young@nuance.com> wrote:
>>>>>> From your message below, it sounds like you are assuming that
>>>>>> access to the API will be contained within callback functions like onClick.
>>>>>> Changes to the underlying speech objects (and thus all protocol
>>>>>> communication) would block until the thread exits.  Correct?
>>>>>>
>>>>>> Not coming from a browser development background, I still think of
>>>>>> applications in terms of a monolithic structure that has its own
>>>>>> threads etc.  I don't know if anyone actually programs in that
>>>>>> way, but it makes me nervous that our API might not be useable in that paradigm.
>>>>>>
>>>>>> Is this a valid concern?
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Bjorn Bringert [mailto:bringert@google.com]
>>>>>> Sent: Thursday, July 28, 2011 12:11 PM
>>>>>> To: Young, Milan
>>>>>> Cc: Deborah Dahl; HTML Speech XG
>>>>>> Subject: Re: [speechXG] parameter setting for the recognition
>>>>>> object
>>>>>>
>>>>>> I believe that the browser event loop would take care of this. The
>>>>>> browser should defer telling the speech recognizer about the
>>>>>> changed parameters until it finishes processing the current
>>>>>> JavaScript invocation. For example, I think that this is how
>>>>>> changing DOM attributes of visible elements is done without
>>>>>> intermediate states being rendered.
>>>>>>
>>>>>> /Bjorn
>>>>>>
>>>>>> On Thu, Jul 28, 2011 at 11:57 AM, Young, Milan
>>>>>> <Milan.Young@nuance.com>
>>>>>> wrote:
>>>>>>> Bjorn, are you arguing that we don't need to solve this problem,
>>>>>>> or
>>>>>> that
>>>>>>> something about the JavaScript flow control implicitly handles
>>>>>>> the issue?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: public-xg-htmlspeech-request@w3.org
>>>>>>> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn
>>>>>> Bringert
>>>>>>> Sent: Thursday, July 28, 2011 11:23 AM
>>>>>>> To: Deborah Dahl
>>>>>>> Cc: HTML Speech XG
>>>>>>> Subject: Re: [speechXG] parameter setting for the recognition
>>>>>>> object
>>>>>>>
>>>>>>> My suggestion towards the end was to not have any special API
>>>>>>> support for atomically setting multiple parameters. That is,
>>>>>>> neither a map nor updateParameters(). This would match how
>>>>>>> setting parameters on DOM elements works.
>>>>>>>
>>>>>>> /Bjorn
>>>>>>>
>>>>>>> On Thu, Jul 28, 2011 at 11:17 AM, Deborah Dahl
>>>>>>> <dahl@conversational-technologies.com> wrote:
>>>>>>>> On today's call we talked about the general process of setting
>>>>>>> parameters
>>>>>>>> for recognition. In my proposal
>>>>>>>>
>>>>>>>
>>>>>> (http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Jun/
>>>>>> 0
>>>>>> 0
>>>>>> 8
>>>>>> 6
>>>>>> .h
>>>>>>> tml)
>>>>>>>> I had suggested that we might want to pick a couple of very
>>>>>> frequently
>>>>>>> set
>>>>>>>> parameters (e.g. grammar and language) and allow them to be set
>>>>>>> directly as
>>>>>>>> parameters of the "start recognition" method, as a type of
>>>>>> convenience
>>>>>>>> syntax. We agreed during the call that this was not necessary,
>>>>>>>> and
>>>>>>> agreed
>>>>>>>> that all the parameters should be explicitly set on the
>>>>>>>> recognition
>>>>>>> object,
>>>>>>>> e.g. something like "recognizer.endpointdetection(true)".
>>>>>>>> However,
>>>>>>> this
>>>>>>>> raised the question of what happens when parameters are set
>>>>>>>> while a recognition is in progress. Bjorn's suggestion was to
>>>>>>>> have an "updateParameters" method that is invoked after the
>>>>>>>> parameter setting function is called to actually cause the
>>>>>>>> parameters to take effect on
>>>>>>> the
>>>>>>>> recognition object. Another option is to distinguish parameters
>>>>>>>> that
>>>>>>> take
>>>>>>>> effect immediately, like changing the grammar, from parameters
>>>>>>>> that
>>>>>>> take
>>>>>>>> effect only when the next recognition occurs (like maxnbest).
>>>>>>>> We also discussed setting multiple parameters and whether there
>>>>>> should
>>>>>>> be a
>>>>>>>> way to set several parameters in one call, as in this example
>>>>>>>> that
>>>>>>> Olli
>>>>>>>> typed into irc: setParameters({ param1: value, param2: value2}).
>>>>>>>> This
>>>>>>> might
>>>>>>>> be convenient, but Bjorn pointed out that it isn't done in any
>>>>>>>> HTML
>>>>>>> API's.
>>>>>>>> I'm hoping to update my proposal and sent it out again next
>>>>>>>> week, so
>>>>>>> any
>>>>>>>> discussion on the list in the meantime would be helpful. If
>>>>>>>> anyone
>>>>>> who
>>>>>>> was
>>>>>>>> on the call today has anything to add to this summary, that
>>>>>>>> would be helpful, too.
>>>>>>>> Debbie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Bjorn Bringert
>>>>>>> Google UK Limited, Registered Office: Belgrave House, 76
>>>>>>> Buckingham Palace Road, London, SW1W 9TQ Registered in England
>>>>>>> Number: 3977902
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Bjorn Bringert
>>>>>> Google UK Limited, Registered Office: Belgrave House, 76
>>>>>> Buckingham Palace Road, London, SW1W 9TQ Registered in England
>>>>>> Number: 3977902
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Bjorn Bringert
>>>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>>>>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Bjorn Bringert
>>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>>>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>>>>
>>>
>>>
>>>
>>> --
>>> Bjorn Bringert
>>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>>>
>>
>>
>>
>> --
>> Bjorn Bringert
>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>>
>
>
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902
Received on Tuesday, 2 August 2011 00:31:25 UTC