Re: R29. Web application may only listen in response to user action

It sounds like we are in violent agreement that this needs to be  
discussed.  We will address it at the face-to-face meeting.

-- dan

On Oct 26, 2010, at 6:18 PM, Olli Pettay wrote:

> On 10/27/2010 01:00 AM, Dave Burke wrote:
>> I think this is a really important topic
> Indeed.
>
> and worthy of some F2F
>> discussion (appropriately minuted for folks who can't attend, of
>> course).
>
> agree
>
>
> I no more want evil.com <http://evil.com> starting to listen to
>> audio in my environment by itself
> To me R29 is quite close to what geolocation is.
> The sites which want to use ASR, need to get my permission to do so.
> Currently geolocation uses usually non-modal dialogs, IIRC.
>
>> than I would want random site y to be
>> able to script a file <input> element to read my local files. And
>> popping up permission dialogs (modal or otherwise) doesn't scale  
>> either
>> as eventually surfing the Web would become an exercise in popup spam.
> Yes, this is a problem when we're starting to get more and more
> dialogs.
> I guess many people just ignore non-modal notificationbar-style
> dialogs, so the feature is disabled for them.
> Modal dialogs are just annoying and people press whatever they need to
> to get rid of them.
>
> I wonder if DAP WG will fix the access control for us
> http://dev.w3.org/2009/dap/policy-reqs/
>
> -Olli
>
>
>>
>> If a user-agent wants to provide some kind of override (e.g. download
>> webapp/extension with grouped permission models built into some  
>> kind of
>> application installation metaphor), sure. But let's not break safe  
>> Web
>> browsing for the majority of users by increasing the surface area of
>> attack or annoyance.
>>
>> Dave
>>
>> On Mon, Oct 25, 2010 at 5:57 PM, Bjorn Bringert <bringert@google.com
>> <mailto:bringert@google.com>> wrote:
>>
>>    I don't think that requiring user action necessarily rules out
>>    hands-free usage. The user action itself could be spoken input.  
>> The
>>    key is to not give the *web app* access to recognition results (or
>>    audio) without user action. We should perhaps reword the  
>> requirement
>>    to reflect that.
>>
>>    /Bjorn
>>
>>    On Fri, Oct 22, 2010 at 8:54 PM, Robert Brown
>>    <Robert.Brown@microsoft.com <mailto:Robert.Brown@microsoft.com>>  
>> wrote:
>>     > I don't have data, but I suspect that presenting a list of
>>    choices, some of which are sticky (like Michael lists below) may
>>    also mitigate the tendency for users to blindly click-through.
>>      (That's been my personal experience with some of the popup- 
>> blocker
>>    UIs I've used, and there may be data for those)
>>     >
>>     > An earcon may also be an appropriate indicator that the  
>> system is
>>    listening, especially on smart phones.
>>     >
>>     > -----Original Message-----
>>     > From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>    [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of Michael
>>    Bodell
>>     > Sent: Friday, October 22, 2010 12:29 PM
>>     > To: Deborah Dahl; 'Satish Sampath'
>>     > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg- 
>> htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     > Subject: RE: R29. Web application may only listen in response  
>> to
>>    user action
>>     >
>>     > I agree that this requirement is problematic for hands-free and
>>    other usage scenarios.  IMO explicit user action should not be
>>    required before speech recognition occurs on each and every page
>>    load and/or speech recognition.  The privacy and security concerns
>>    that I think we all share is that speech recognition should not
>>    happen without user consent (in general) and should not happen
>>    without the user being aware that the speech recognition is
>>    happening (in this particular instance).  Requirements along those
>>    lines are the "what" requirement that we must follow to support  
>> user
>>    privacy and security concerns.  But those two requirements do  
>> *NOT*
>>    mean that the "web application may only listen in response to user
>>    action" which is a "how" requirement (I.e., how we protect the
>>    user).  IMO it may be the case that recognition occurs as a result
>>    of any of (not an exhaustive list):
>>     >
>>     > - page load (not covered by this requirement)
>>     > - focus event driven by explicit user action (covered by this
>>    requirement)
>>     > - focus event driven by natural page flow (not covered by this
>>    requirement)
>>     > - scripting by the application author (not covered by this
>>    requirement)
>>     >
>>     > This requirement is too restrictive for these and other use  
>> cases.
>>     >
>>     > A different "how" might be that the user agent, for instance,
>>    prompts the user when a page wants to do speech for the first time
>>    and gives them a set of choices such as, for example:
>>     >
>>     > - Always allow any page to do speech without prompting
>>     > - Always allow any page on this domain to do speech without  
>> prompting
>>     > - Allow just this one page this session to do speech (and  
>> prompt
>>    in the future)
>>     > - Don't allow this page this session to do speech (and prompt  
>> in
>>    the future)
>>     > - Don't ever allow any page on this domain to do speech
>>     > - Don't ever allow any page ever to do speech
>>     >
>>     > If the user chooses one of the "always allow" then in the  
>> future
>>    the web application would be able to listen to the user definitely
>>    without any user action.  Or it could be that the user agent has  
>> the
>>    equivalent of these prompts instead in some configuration settings
>>    depending on the security/privacy settings of the user.
>>     >
>>     > Maybe people feel that a user agent configuration setting is
>>    sufficient "user action" to count in this requirement and all of  
>> the
>>    use cases above would meet this requirement, but that isn't how I
>>    interpreted this requirement and I think if that is the case we
>>    should reword it as the requirement implies there needs to be
>>    explicit user action each time recognition occurs.
>>     >
>>     > As for the user being aware recognition is happening a  
>> different
>>    "how" approach is more reasonable IMO: the chrome of the user  
>> agent
>>    can provide clues.  There is already a well-established pattern  
>> that
>>    many user agents provide visual clues when certain things occur,  
>> for
>>    example:
>>     >
>>     > - a spinning browser icon when content is being loaded in the
>>    background
>>     > - some sort of secure lock image when the page is loaded over a
>>    secure channel
>>     >
>>     > So a similar idea could occur when the user is being recorded  
>> or
>>    when there is speech recognition going on.  Something like a
>>    microphone icon or a red light or some sort of clue *in the user
>>    agent's chrome* that doesn't interfere with the visual display of
>>    the web application.  You wouldn't want this indication to occur  
>> in
>>    the visual display of the web application itself (I.e., a  
>> microphone
>>    icon in the input field) because different web applications may  
>> want
>>    different user interface options and also because anything like  
>> that
>>    in the visual display of the web application could be spoofed by  
>> the
>>    web application and isn't as trusted as icons, images, different
>>    color/highlighted text in the user agent's chrome.
>>     >
>>     > -----Original Message-----
>>     > From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>    [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of  
>> Deborah Dahl
>>     > Sent: Friday, October 22, 2010 8:08 AM
>>     > To: 'Satish Sampath'
>>     > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg- 
>> htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     > Subject: RE: R29. Web application may only listen in response  
>> to
>>    user action
>>     >
>>     > Yes, you  could do that, but then the application wouldn't be
>>    hands-free.
>>     > Now probably isn't the time to start talking about approaches
>>    that would enable us to address both requirements, I'm just  
>> pointing
>>    out that we should be aware of a potential conflict. I think we
>>    should actually classify both requirements as "should address",  
>> but
>>    note that there's an issue in our requirements document.
>>     >
>>     >> -----Original Message-----
>>     >> From: Satish Sampath [mailto:satish@google.com
>>    <mailto:satish@google.com>]
>>     >> Sent: Friday, October 22, 2010 9:43 AM
>>     >> To: Deborah Dahl
>>     >> Cc: Bjorn Bringert; Dan Burnett; public-xg-htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     >> Subject: Re: R29. Web application may only listen in  
>> response to
>>    user
>>     > action
>>     >>
>>     >> One possibility for R24 is that the end user performs an  
>> action on
>>     >> page
>>     > load
>>     >> and from then on using continuous speech input they can  
>> interact
>>    with
>>     >> the application in a hands-free mode. This could be a click  
>> on a
>>     >> button or
>>     > some
>>     >> other accessibility-friendly gesture.
>>     >>
>>     >> Cheers
>>     >> Satish
>>     >>
>>     >>
>>     >>
>>     >> On Fri, Oct 22, 2010 at 2:39 PM, Deborah Dahl  
>> <dahl@conversational-
>>     >> technologies.com <http://technologies.com>> wrote:
>>     >>
>>     >>
>>     >>       I see a possible conflict between requiring user  
>> action to
>>    enable
>>     >> speech
>>     >>       recognition and R24. "End user should be able to use
>>    speech in a
>>     >> hands-free
>>     >>       mode" if "user action" means doing something that  
>> requires
>>    use of the
>>     >> hands.
>>     >>       I think both requirements are important but satisfying
>>    them both
>>     >> might
>>     >>       require some thought.
>>     >>
>>     >>       From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>     >>       [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of Satish
>>     >> Sampath
>>     >>       Sent: Friday, October 22, 2010 7:24 AM
>>     >>       To: Bjorn Bringert
>>     >>       Cc: Dan Burnett; public-xg-htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     >>       Subject: Re: R29. Web application may only listen in
>>    response to
>>     > user
>>     >> action
>>     >>
>>     >>
>>     >>       User experience studies have also shown that end users
>>    have got used
>>     >> to
>>     >>       clicking away any popup dialogs that come up when they  
>> are
>>    browsing
>>     >> the
>>     >>       web.. common ones include phishing/malware warnings,  
>> download
>>     >> notifications
>>     >>       etc. This is one of the reasons why browser vendors  
>> are moving
>>     >> towards
>>     >>       in-page notifications for some of these where  
>> applicable, and
>>     >> requiring
>>     >>       explicit user action for others. So I think this is a  
>> good
>>     > requirement to
>>     >>       have.
>>     >>
>>     >>       The other side of this is that the web page should not  
>> be
>>    allowed to
>>     >>       automatically initiate speech input/audio capture via an
>>    API call.
>>     >>
>>     >>       Cheers
>>     >>       Satish
>>     >>
>>     >>       On Fri, Oct 22, 2010 at 12:18 PM, Bjorn Bringert
>>     >> <bringert@google.com <mailto:bringert@google.com>>
>>     >>       wrote:
>>     >>       This requirement was motivated by privacy concerns. If  
>> the web
>>     >>       application can start speech recognition at any time,  
>> it can
>>     > eavesdrop
>>     >>       on a user.
>>     >>
>>     >>       An alternative to requiring user action would be to  
>> have a
>>     > permission
>>     >>       dialog of some kind. As far as I understand, browser
>>    implementors
>>     >>       would not like a proliferation of permission dialogs
>>    annoying their
>>     >>       users.
>>     >>
>>     >>       /Bjorn
>>     >>
>>     >>       On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett
>>    <dburnett@voxeo.com <mailto:dburnett@voxeo.com>>
>>     >> wrote:
>>     >> > Group,
>>     >> >
>>     >> > This is the first of the requirements to discuss and  
>> prioritize
>>     > based
>>     >> on
>>     >>       our
>>     >> > ranking approach [1].
>>     >> >
>>     >> > This email is the beginning of a thread for questions,  
>> discussion,
>>     >> and
>>     >> > opinions regarding our first draft of Requirement 29 [2].
>>     >> >
>>     >> > After our discussion and any modifications to the  
>> requirement, our
>>     >> goal is
>>     >> > to prioritize this requirement as either "Should Address"  
>> or "For
>>     >> Future
>>     >> > Consideration".
>>     >> >
>>     >> > -- dan
>>     >> >
>>     >> > [1]
>>     >> > http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/0024.html
>>     >> > [2]
>>     >> >
>>     >> http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/att-0001/sp
>>     >>       eech.html#r29 <http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/att-0001/sp eech.html#r29>
>>     >> >
>>     >> >
>>     >>
>>     >>
>>     >>       --
>>     >>       Bjorn Bringert
>>     >>       Google UK Limited, Registered Office: Belgrave House, 76
>>    Buckingham
>>     >>       Palace Road, London, SW1W 9TQ
>>     >>       Registered in England Number: 3977902
>>     >>
>>     >>
>>     >>
>>     >>
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>
>>
>>
>>    --
>>    Bjorn Bringert
>>    Google UK Limited, Registered Office: Belgrave House, 76  
>> Buckingham
>>    Palace Road, London, SW1W 9TQ
>>    Registered in England Number: 3977902
>>
>>
>

Received on Friday, 29 October 2010 00:41:56 UTC