Re: R29. Web application may only listen in response to user action from Dan Burnett on 2010-10-29 (public-xg-htmlspeech@w3.org from October 2010)

From: Dan Burnett <dburnett@voxeo.com>
Date: Thu, 28 Oct 2010 20:41:09 -0400
To: Olli Pettay <olli.pettay@gmail.com>
Cc: Dave Burke <daveburke@google.com>, Bjorn Bringert <bringert@google.com>, Robert Brown <Robert.Brown@microsoft.com>, Michael Bodell <mbodell@microsoft.com>, Deborah Dahl <dahl@conversational-technologies.com>, Satish Sampath <satish@google.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-Id: <1CDB032D-E05C-4A9E-B3DC-A9C39C222265@voxeo.com>
It sounds like we are in violent agreement that this needs to be  
discussed.  We will address it at the face-to-face meeting.

-- dan

On Oct 26, 2010, at 6:18 PM, Olli Pettay wrote:

> On 10/27/2010 01:00 AM, Dave Burke wrote:
>> I think this is a really important topic
> Indeed.
>
> and worthy of some F2F
>> discussion (appropriately minuted for folks who can't attend, of
>> course).
>
> agree
>
>
> I no more want evil.com <http://evil.com> starting to listen to
>> audio in my environment by itself
> To me R29 is quite close to what geolocation is.
> The sites which want to use ASR, need to get my permission to do so.
> Currently geolocation uses usually non-modal dialogs, IIRC.
>
>> than I would want random site y to be
>> able to script a file <input> element to read my local files. And
>> popping up permission dialogs (modal or otherwise) doesn't scale  
>> either
>> as eventually surfing the Web would become an exercise in popup spam.
> Yes, this is a problem when we're starting to get more and more
> dialogs.
> I guess many people just ignore non-modal notificationbar-style
> dialogs, so the feature is disabled for them.
> Modal dialogs are just annoying and people press whatever they need to
> to get rid of them.
>
> I wonder if DAP WG will fix the access control for us
> http://dev.w3.org/2009/dap/policy-reqs/
>
> -Olli
>
>
>>
>> If a user-agent wants to provide some kind of override (e.g. download
>> webapp/extension with grouped permission models built into some  
>> kind of
>> application installation metaphor), sure. But let's not break safe  
>> Web
>> browsing for the majority of users by increasing the surface area of
>> attack or annoyance.
>>
>> Dave
>>
>> On Mon, Oct 25, 2010 at 5:57 PM, Bjorn Bringert <bringert@google.com
>> <mailto:bringert@google.com>> wrote:
>>
>>    I don't think that requiring user action necessarily rules out
>>    hands-free usage. The user action itself could be spoken input.  
>> The
>>    key is to not give the *web app* access to recognition results (or
>>    audio) without user action. We should perhaps reword the  
>> requirement
>>    to reflect that.
>>
>>    /Bjorn
>>
>>    On Fri, Oct 22, 2010 at 8:54 PM, Robert Brown
>>    <Robert.Brown@microsoft.com <mailto:Robert.Brown@microsoft.com>>  
>> wrote:
>>     > I don't have data, but I suspect that presenting a list of
>>    choices, some of which are sticky (like Michael lists below) may
>>    also mitigate the tendency for users to blindly click-through.
>>      (That's been my personal experience with some of the popup- 
>> blocker
>>    UIs I've used, and there may be data for those)
>>     >
>>     > An earcon may also be an appropriate indicator that the  
>> system is
>>    listening, especially on smart phones.
>>     >
>>     > -----Original Message-----
>>     > From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>    [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of Michael
>>    Bodell
>>     > Sent: Friday, October 22, 2010 12:29 PM
>>     > To: Deborah Dahl; 'Satish Sampath'
>>     > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg- 
>> htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     > Subject: RE: R29. Web application may only listen in response  
>> to
>>    user action
>>     >
>>     > I agree that this requirement is problematic for hands-free and
>>    other usage scenarios.  IMO explicit user action should not be
>>    required before speech recognition occurs on each and every page
>>    load and/or speech recognition.  The privacy and security concerns
>>    that I think we all share is that speech recognition should not
>>    happen without user consent (in general) and should not happen
>>    without the user being aware that the speech recognition is
>>    happening (in this particular instance).  Requirements along those
>>    lines are the "what" requirement that we must follow to support  
>> user
>>    privacy and security concerns.  But those two requirements do  
>> *NOT*
>>    mean that the "web application may only listen in response to user
>>    action" which is a "how" requirement (I.e., how we protect the
>>    user).  IMO it may be the case that recognition occurs as a result
>>    of any of (not an exhaustive list):
>>     >
>>     > - page load (not covered by this requirement)
>>     > - focus event driven by explicit user action (covered by this
>>    requirement)
>>     > - focus event driven by natural page flow (not covered by this
>>    requirement)
>>     > - scripting by the application author (not covered by this
>>    requirement)
>>     >
>>     > This requirement is too restrictive for these and other use  
>> cases.
>>     >
>>     > A different "how" might be that the user agent, for instance,
>>    prompts the user when a page wants to do speech for the first time
>>    and gives them a set of choices such as, for example:
>>     >
>>     > - Always allow any page to do speech without prompting
>>     > - Always allow any page on this domain to do speech without  
>> prompting
>>     > - Allow just this one page this session to do speech (and  
>> prompt
>>    in the future)
>>     > - Don't allow this page this session to do speech (and prompt  
>> in
>>    the future)
>>     > - Don't ever allow any page on this domain to do speech
>>     > - Don't ever allow any page ever to do speech
>>     >
>>     > If the user chooses one of the "always allow" then in the  
>> future
>>    the web application would be able to listen to the user definitely
>>    without any user action.  Or it could be that the user agent has  
>> the
>>    equivalent of these prompts instead in some configuration settings
>>    depending on the security/privacy settings of the user.
>>     >
>>     > Maybe people feel that a user agent configuration setting is
>>    sufficient "user action" to count in this requirement and all of  
>> the
>>    use cases above would meet this requirement, but that isn't how I
>>    interpreted this requirement and I think if that is the case we
>>    should reword it as the requirement implies there needs to be
>>    explicit user action each time recognition occurs.
>>     >
>>     > As for the user being aware recognition is happening a  
>> different
>>    "how" approach is more reasonable IMO: the chrome of the user  
>> agent
>>    can provide clues.  There is already a well-established pattern  
>> that
>>    many user agents provide visual clues when certain things occur,  
>> for
>>    example:
>>     >
>>     > - a spinning browser icon when content is being loaded in the
>>    background
>>     > - some sort of secure lock image when the page is loaded over a
>>    secure channel
>>     >
>>     > So a similar idea could occur when the user is being recorded  
>> or
>>    when there is speech recognition going on.  Something like a
>>    microphone icon or a red light or some sort of clue *in the user
>>    agent's chrome* that doesn't interfere with the visual display of
>>    the web application.  You wouldn't want this indication to occur  
>> in
>>    the visual display of the web application itself (I.e., a  
>> microphone
>>    icon in the input field) because different web applications may  
>> want
>>    different user interface options and also because anything like  
>> that
>>    in the visual display of the web application could be spoofed by  
>> the
>>    web application and isn't as trusted as icons, images, different
>>    color/highlighted text in the user agent's chrome.
>>     >
>>     > -----Original Message-----
>>     > From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>    [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of  
>> Deborah Dahl
>>     > Sent: Friday, October 22, 2010 8:08 AM
>>     > To: 'Satish Sampath'
>>     > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg- 
>> htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     > Subject: RE: R29. Web application may only listen in response  
>> to
>>    user action
>>     >
>>     > Yes, you  could do that, but then the application wouldn't be
>>    hands-free.
>>     > Now probably isn't the time to start talking about approaches
>>    that would enable us to address both requirements, I'm just  
>> pointing
>>    out that we should be aware of a potential conflict. I think we
>>    should actually classify both requirements as "should address",  
>> but
>>    note that there's an issue in our requirements document.
>>     >
>>     >> -----Original Message-----
>>     >> From: Satish Sampath [mailto:satish@google.com
>>    <mailto:satish@google.com>]
>>     >> Sent: Friday, October 22, 2010 9:43 AM
>>     >> To: Deborah Dahl
>>     >> Cc: Bjorn Bringert; Dan Burnett; public-xg-htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     >> Subject: Re: R29. Web application may only listen in  
>> response to
>>    user
>>     > action
>>     >>
>>     >> One possibility for R24 is that the end user performs an  
>> action on
>>     >> page
>>     > load
>>     >> and from then on using continuous speech input they can  
>> interact
>>    with
>>     >> the application in a hands-free mode. This could be a click  
>> on a
>>     >> button or
>>     > some
>>     >> other accessibility-friendly gesture.
>>     >>
>>     >> Cheers
>>     >> Satish
>>     >>
>>     >>
>>     >>
>>     >> On Fri, Oct 22, 2010 at 2:39 PM, Deborah Dahl  
>> <dahl@conversational-
>>     >> technologies.com <http://technologies.com>> wrote:
>>     >>
>>     >>
>>     >>       I see a possible conflict between requiring user  
>> action to
>>    enable
>>     >> speech
>>     >>       recognition and R24. "End user should be able to use
>>    speech in a
>>     >> hands-free
>>     >>       mode" if "user action" means doing something that  
>> requires
>>    use of the
>>     >> hands.
>>     >>       I think both requirements are important but satisfying
>>    them both
>>     >> might
>>     >>       require some thought.
>>     >>
>>     >>       From: public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>
>>     >>       [mailto:public-xg-htmlspeech-request@w3.org
>>    <mailto:public-xg-htmlspeech-request@w3.org>] On Behalf Of Satish
>>     >> Sampath
>>     >>       Sent: Friday, October 22, 2010 7:24 AM
>>     >>       To: Bjorn Bringert
>>     >>       Cc: Dan Burnett; public-xg-htmlspeech@w3.org
>>    <mailto:public-xg-htmlspeech@w3.org>
>>     >>       Subject: Re: R29. Web application may only listen in
>>    response to
>>     > user
>>     >> action
>>     >>
>>     >>
>>     >>       User experience studies have also shown that end users
>>    have got used
>>     >> to
>>     >>       clicking away any popup dialogs that come up when they  
>> are
>>    browsing
>>     >> the
>>     >>       web.. common ones include phishing/malware warnings,  
>> download
>>     >> notifications
>>     >>       etc. This is one of the reasons why browser vendors  
>> are moving
>>     >> towards
>>     >>       in-page notifications for some of these where  
>> applicable, and
>>     >> requiring
>>     >>       explicit user action for others. So I think this is a  
>> good
>>     > requirement to
>>     >>       have.
>>     >>
>>     >>       The other side of this is that the web page should not  
>> be
>>    allowed to
>>     >>       automatically initiate speech input/audio capture via an
>>    API call.
>>     >>
>>     >>       Cheers
>>     >>       Satish
>>     >>
>>     >>       On Fri, Oct 22, 2010 at 12:18 PM, Bjorn Bringert
>>     >> <bringert@google.com <mailto:bringert@google.com>>
>>     >>       wrote:
>>     >>       This requirement was motivated by privacy concerns. If  
>> the web
>>     >>       application can start speech recognition at any time,  
>> it can
>>     > eavesdrop
>>     >>       on a user.
>>     >>
>>     >>       An alternative to requiring user action would be to  
>> have a
>>     > permission
>>     >>       dialog of some kind. As far as I understand, browser
>>    implementors
>>     >>       would not like a proliferation of permission dialogs
>>    annoying their
>>     >>       users.
>>     >>
>>     >>       /Bjorn
>>     >>
>>     >>       On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett
>>    <dburnett@voxeo.com <mailto:dburnett@voxeo.com>>
>>     >> wrote:
>>     >> > Group,
>>     >> >
>>     >> > This is the first of the requirements to discuss and  
>> prioritize
>>     > based
>>     >> on
>>     >>       our
>>     >> > ranking approach [1].
>>     >> >
>>     >> > This email is the beginning of a thread for questions,  
>> discussion,
>>     >> and
>>     >> > opinions regarding our first draft of Requirement 29 [2].
>>     >> >
>>     >> > After our discussion and any modifications to the  
>> requirement, our
>>     >> goal is
>>     >> > to prioritize this requirement as either "Should Address"  
>> or "For
>>     >> Future
>>     >> > Consideration".
>>     >> >
>>     >> > -- dan
>>     >> >
>>     >> > [1]
>>     >> > http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/0024.html
>>     >> > [2]
>>     >> >
>>     >> http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/att-0001/sp
>>     >>       eech.html#r29 <http://lists.w3.org/Archives/Public/public-xg-
>>     >> htmlspeech/2010Oct/att-0001/sp eech.html#r29>
>>     >> >
>>     >> >
>>     >>
>>     >>
>>     >>       --
>>     >>       Bjorn Bringert
>>     >>       Google UK Limited, Registered Office: Belgrave House, 76
>>    Buckingham
>>     >>       Palace Road, London, SW1W 9TQ
>>     >>       Registered in England Number: 3977902
>>     >>
>>     >>
>>     >>
>>     >>
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>     >
>>
>>
>>
>>    --
>>    Bjorn Bringert
>>    Google UK Limited, Registered Office: Belgrave House, 76  
>> Buckingham
>>    Palace Road, London, SW1W 9TQ
>>    Registered in England Number: 3977902
>>
>>
>
Received on Friday, 29 October 2010 00:41:56 UTC