Re: R29. Web application may only listen in response to user action

I don't think that requiring user action necessarily rules out
hands-free usage. The user action itself could be spoken input. The
key is to not give the *web app* access to recognition results (or
audio) without user action. We should perhaps reword the requirement
to reflect that.

/Bjorn

On Fri, Oct 22, 2010 at 8:54 PM, Robert Brown
<Robert.Brown@microsoft.com> wrote:
> I don't have data, but I suspect that presenting a list of choices, some of which are sticky (like Michael lists below) may also mitigate the tendency for users to blindly click-through.  (That's been my personal experience with some of the popup-blocker UIs I've used, and there may be data for those)
>
> An earcon may also be an appropriate indicator that the system is listening, especially on smart phones.
>
> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Michael Bodell
> Sent: Friday, October 22, 2010 12:29 PM
> To: Deborah Dahl; 'Satish Sampath'
> Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg-htmlspeech@w3.org
> Subject: RE: R29. Web application may only listen in response to user action
>
> I agree that this requirement is problematic for hands-free and other usage scenarios.  IMO explicit user action should not be required before speech recognition occurs on each and every page load and/or speech recognition.  The privacy and security concerns that I think we all share is that speech recognition should not happen without user consent (in general) and should not happen without the user being aware that the speech recognition is happening (in this particular instance).  Requirements along those lines are the "what" requirement that we must follow to support user privacy and security concerns.  But those two requirements do *NOT* mean that the "web application may only listen in response to user action" which is a "how" requirement (I.e., how we protect the user).  IMO it may be the case that recognition occurs as a result of any of (not an exhaustive list):
>
> - page load (not covered by this requirement)
> - focus event driven by explicit user action (covered by this requirement)
> - focus event driven by natural page flow (not covered by this requirement)
> - scripting by the application author (not covered by this requirement)
>
> This requirement is too restrictive for these and other use cases.
>
> A different "how" might be that the user agent, for instance, prompts the user when a page wants to do speech for the first time and gives them a set of choices such as, for example:
>
> - Always allow any page to do speech without prompting
> - Always allow any page on this domain to do speech without prompting
> - Allow just this one page this session to do speech (and prompt in the future)
> - Don't allow this page this session to do speech (and prompt in the future)
> - Don't ever allow any page on this domain to do speech
> - Don't ever allow any page ever to do speech
>
> If the user chooses one of the "always allow" then in the future the web application would be able to listen to the user definitely without any user action.  Or it could be that the user agent has the equivalent of these prompts instead in some configuration settings depending on the security/privacy settings of the user.
>
> Maybe people feel that a user agent configuration setting is sufficient "user action" to count in this requirement and all of the use cases above would meet this requirement, but that isn't how I interpreted this requirement and I think if that is the case we should reword it as the requirement implies there needs to be explicit user action each time recognition occurs.
>
> As for the user being aware recognition is happening a different "how" approach is more reasonable IMO: the chrome of the user agent can provide clues.  There is already a well-established pattern that many user agents provide visual clues when certain things occur, for example:
>
> - a spinning browser icon when content is being loaded in the background
> - some sort of secure lock image when the page is loaded over a secure channel
>
> So a similar idea could occur when the user is being recorded or when there is speech recognition going on.  Something like a microphone icon or a red light or some sort of clue *in the user agent's chrome* that doesn't interfere with the visual display of the web application.  You wouldn't want this indication to occur in the visual display of the web application itself (I.e., a microphone icon in the input field) because different web applications may want different user interface options and also because anything like that in the visual display of the web application could be spoofed by the web application and isn't as trusted as icons, images, different color/highlighted text in the user agent's chrome.
>
> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah Dahl
> Sent: Friday, October 22, 2010 8:08 AM
> To: 'Satish Sampath'
> Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg-htmlspeech@w3.org
> Subject: RE: R29. Web application may only listen in response to user action
>
> Yes, you  could do that, but then the application wouldn't be hands-free.
> Now probably isn't the time to start talking about approaches that would enable us to address both requirements, I'm just pointing out that we should be aware of a potential conflict. I think we should actually classify both requirements as "should address", but note that there's an issue in our requirements document.
>
>> -----Original Message-----
>> From: Satish Sampath [mailto:satish@google.com]
>> Sent: Friday, October 22, 2010 9:43 AM
>> To: Deborah Dahl
>> Cc: Bjorn Bringert; Dan Burnett; public-xg-htmlspeech@w3.org
>> Subject: Re: R29. Web application may only listen in response to user
> action
>>
>> One possibility for R24 is that the end user performs an action on
>> page
> load
>> and from then on using continuous speech input they can interact with
>> the application in a hands-free mode. This could be a click on a
>> button or
> some
>> other accessibility-friendly gesture.
>>
>> Cheers
>> Satish
>>
>>
>>
>> On Fri, Oct 22, 2010 at 2:39 PM, Deborah Dahl <dahl@conversational-
>> technologies.com> wrote:
>>
>>
>>       I see a possible conflict between requiring user action to enable
>> speech
>>       recognition and R24. "End user should be able to use speech in a
>> hands-free
>>       mode" if "user action" means doing something that requires use of the
>> hands.
>>       I think both requirements are important but satisfying them both
>> might
>>       require some thought.
>>
>>       From: public-xg-htmlspeech-request@w3.org
>>       [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish
>> Sampath
>>       Sent: Friday, October 22, 2010 7:24 AM
>>       To: Bjorn Bringert
>>       Cc: Dan Burnett; public-xg-htmlspeech@w3.org
>>       Subject: Re: R29. Web application may only listen in response to
> user
>> action
>>
>>
>>       User experience studies have also shown that end users have got used
>> to
>>       clicking away any popup dialogs that come up when they are browsing
>> the
>>       web.. common ones include phishing/malware warnings, download
>> notifications
>>       etc. This is one of the reasons why browser vendors are moving
>> towards
>>       in-page notifications for some of these where applicable, and
>> requiring
>>       explicit user action for others. So I think this is a good
> requirement to
>>       have.
>>
>>       The other side of this is that the web page should not be allowed to
>>       automatically initiate speech input/audio capture via an API call.
>>
>>       Cheers
>>       Satish
>>
>>       On Fri, Oct 22, 2010 at 12:18 PM, Bjorn Bringert
>> <bringert@google.com>
>>       wrote:
>>       This requirement was motivated by privacy concerns. If the web
>>       application can start speech recognition at any time, it can
> eavesdrop
>>       on a user.
>>
>>       An alternative to requiring user action would be to have a
> permission
>>       dialog of some kind. As far as I understand, browser implementors
>>       would not like a proliferation of permission dialogs annoying their
>>       users.
>>
>>       /Bjorn
>>
>>       On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett <dburnett@voxeo.com>
>> wrote:
>>       > Group,
>>       >
>>       > This is the first of the requirements to discuss and prioritize
> based
>> on
>>       our
>>       > ranking approach [1].
>>       >
>>       > This email is the beginning of a thread for questions, discussion,
>> and
>>       > opinions regarding our first draft of Requirement 29 [2].
>>       >
>>       > After our discussion and any modifications to the requirement, our
>> goal is
>>       > to prioritize this requirement as either "Should Address" or "For
>> Future
>>       > Consideration".
>>       >
>>       > -- dan
>>       >
>>       > [1]
>>       > http://lists.w3.org/Archives/Public/public-xg-
>> htmlspeech/2010Oct/0024.html
>>       > [2]
>>       >
>>       http://lists.w3.org/Archives/Public/public-xg-
>> htmlspeech/2010Oct/att-0001/sp
>>       eech.html#r29 <http://lists.w3.org/Archives/Public/public-xg-
>> htmlspeech/2010Oct/att-0001/sp eech.html#r29>
>>       >
>>       >
>>
>>
>>       --
>>       Bjorn Bringert
>>       Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
>>       Palace Road, London, SW1W 9TQ
>>       Registered in England Number: 3977902
>>
>>
>>
>>
>
>
>
>
>
>
>



-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902

Received on Monday, 25 October 2010 16:58:01 UTC