Re: R29. Web application may only listen in response to user action

I think this is a really important topic and worthy of some F2F discussion
(appropriately minuted for folks who can't attend, of course). I no more
want evil.com starting to listen to audio in my environment by itself than I
would want random site y to be able to script a file <input> element to read
my local files. And popping up permission dialogs (modal or otherwise)
doesn't scale either as eventually surfing the Web would become an exercise
in popup spam.

If a user-agent wants to provide some kind of override (e.g. download
webapp/extension with grouped permission models built into some kind of
application installation metaphor), sure. But let's not break safe Web
browsing for the majority of users by increasing the surface area of attack
or annoyance.

Dave

On Mon, Oct 25, 2010 at 5:57 PM, Bjorn Bringert <bringert@google.com> wrote:

> I don't think that requiring user action necessarily rules out
> hands-free usage. The user action itself could be spoken input. The
> key is to not give the *web app* access to recognition results (or
> audio) without user action. We should perhaps reword the requirement
> to reflect that.
>
> /Bjorn
>
> On Fri, Oct 22, 2010 at 8:54 PM, Robert Brown
> <Robert.Brown@microsoft.com> wrote:
> > I don't have data, but I suspect that presenting a list of choices, some
> of which are sticky (like Michael lists below) may also mitigate the
> tendency for users to blindly click-through.  (That's been my personal
> experience with some of the popup-blocker UIs I've used, and there may be
> data for those)
> >
> > An earcon may also be an appropriate indicator that the system is
> listening, especially on smart phones.
> >
> > -----Original Message-----
> > From: public-xg-htmlspeech-request@w3.org [mailto:
> public-xg-htmlspeech-request@w3.org] On Behalf Of Michael Bodell
> > Sent: Friday, October 22, 2010 12:29 PM
> > To: Deborah Dahl; 'Satish Sampath'
> > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg-htmlspeech@w3.org
> > Subject: RE: R29. Web application may only listen in response to user
> action
> >
> > I agree that this requirement is problematic for hands-free and other
> usage scenarios.  IMO explicit user action should not be required before
> speech recognition occurs on each and every page load and/or speech
> recognition.  The privacy and security concerns that I think we all share is
> that speech recognition should not happen without user consent (in general)
> and should not happen without the user being aware that the speech
> recognition is happening (in this particular instance).  Requirements along
> those lines are the "what" requirement that we must follow to support user
> privacy and security concerns.  But those two requirements do *NOT* mean
> that the "web application may only listen in response to user action" which
> is a "how" requirement (I.e., how we protect the user).  IMO it may be the
> case that recognition occurs as a result of any of (not an exhaustive list):
> >
> > - page load (not covered by this requirement)
> > - focus event driven by explicit user action (covered by this
> requirement)
> > - focus event driven by natural page flow (not covered by this
> requirement)
> > - scripting by the application author (not covered by this requirement)
> >
> > This requirement is too restrictive for these and other use cases.
> >
> > A different "how" might be that the user agent, for instance, prompts the
> user when a page wants to do speech for the first time and gives them a set
> of choices such as, for example:
> >
> > - Always allow any page to do speech without prompting
> > - Always allow any page on this domain to do speech without prompting
> > - Allow just this one page this session to do speech (and prompt in the
> future)
> > - Don't allow this page this session to do speech (and prompt in the
> future)
> > - Don't ever allow any page on this domain to do speech
> > - Don't ever allow any page ever to do speech
> >
> > If the user chooses one of the "always allow" then in the future the web
> application would be able to listen to the user definitely without any user
> action.  Or it could be that the user agent has the equivalent of these
> prompts instead in some configuration settings depending on the
> security/privacy settings of the user.
> >
> > Maybe people feel that a user agent configuration setting is sufficient
> "user action" to count in this requirement and all of the use cases above
> would meet this requirement, but that isn't how I interpreted this
> requirement and I think if that is the case we should reword it as the
> requirement implies there needs to be explicit user action each time
> recognition occurs.
> >
> > As for the user being aware recognition is happening a different "how"
> approach is more reasonable IMO: the chrome of the user agent can provide
> clues.  There is already a well-established pattern that many user agents
> provide visual clues when certain things occur, for example:
> >
> > - a spinning browser icon when content is being loaded in the background
> > - some sort of secure lock image when the page is loaded over a secure
> channel
> >
> > So a similar idea could occur when the user is being recorded or when
> there is speech recognition going on.  Something like a microphone icon or a
> red light or some sort of clue *in the user agent's chrome* that doesn't
> interfere with the visual display of the web application.  You wouldn't want
> this indication to occur in the visual display of the web application itself
> (I.e., a microphone icon in the input field) because different web
> applications may want different user interface options and also because
> anything like that in the visual display of the web application could be
> spoofed by the web application and isn't as trusted as icons, images,
> different color/highlighted text in the user agent's chrome.
> >
> > -----Original Message-----
> > From: public-xg-htmlspeech-request@w3.org [mailto:
> public-xg-htmlspeech-request@w3.org] On Behalf Of Deborah Dahl
> > Sent: Friday, October 22, 2010 8:08 AM
> > To: 'Satish Sampath'
> > Cc: 'Bjorn Bringert'; 'Dan Burnett'; public-xg-htmlspeech@w3.org
> > Subject: RE: R29. Web application may only listen in response to user
> action
> >
> > Yes, you  could do that, but then the application wouldn't be hands-free.
> > Now probably isn't the time to start talking about approaches that would
> enable us to address both requirements, I'm just pointing out that we should
> be aware of a potential conflict. I think we should actually classify both
> requirements as "should address", but note that there's an issue in our
> requirements document.
> >
> >> -----Original Message-----
> >> From: Satish Sampath [mailto:satish@google.com]
> >> Sent: Friday, October 22, 2010 9:43 AM
> >> To: Deborah Dahl
> >> Cc: Bjorn Bringert; Dan Burnett; public-xg-htmlspeech@w3.org
> >> Subject: Re: R29. Web application may only listen in response to user
> > action
> >>
> >> One possibility for R24 is that the end user performs an action on
> >> page
> > load
> >> and from then on using continuous speech input they can interact with
> >> the application in a hands-free mode. This could be a click on a
> >> button or
> > some
> >> other accessibility-friendly gesture.
> >>
> >> Cheers
> >> Satish
> >>
> >>
> >>
> >> On Fri, Oct 22, 2010 at 2:39 PM, Deborah Dahl <dahl@conversational-
> >> technologies.com> wrote:
> >>
> >>
> >>       I see a possible conflict between requiring user action to enable
> >> speech
> >>       recognition and R24. "End user should be able to use speech in a
> >> hands-free
> >>       mode" if "user action" means doing something that requires use of
> the
> >> hands.
> >>       I think both requirements are important but satisfying them both
> >> might
> >>       require some thought.
> >>
> >>       From: public-xg-htmlspeech-request@w3.org
> >>       [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish
> >> Sampath
> >>       Sent: Friday, October 22, 2010 7:24 AM
> >>       To: Bjorn Bringert
> >>       Cc: Dan Burnett; public-xg-htmlspeech@w3.org
> >>       Subject: Re: R29. Web application may only listen in response to
> > user
> >> action
> >>
> >>
> >>       User experience studies have also shown that end users have got
> used
> >> to
> >>       clicking away any popup dialogs that come up when they are
> browsing
> >> the
> >>       web.. common ones include phishing/malware warnings, download
> >> notifications
> >>       etc. This is one of the reasons why browser vendors are moving
> >> towards
> >>       in-page notifications for some of these where applicable, and
> >> requiring
> >>       explicit user action for others. So I think this is a good
> > requirement to
> >>       have.
> >>
> >>       The other side of this is that the web page should not be allowed
> to
> >>       automatically initiate speech input/audio capture via an API call.
> >>
> >>       Cheers
> >>       Satish
> >>
> >>       On Fri, Oct 22, 2010 at 12:18 PM, Bjorn Bringert
> >> <bringert@google.com>
> >>       wrote:
> >>       This requirement was motivated by privacy concerns. If the web
> >>       application can start speech recognition at any time, it can
> > eavesdrop
> >>       on a user.
> >>
> >>       An alternative to requiring user action would be to have a
> > permission
> >>       dialog of some kind. As far as I understand, browser implementors
> >>       would not like a proliferation of permission dialogs annoying
> their
> >>       users.
> >>
> >>       /Bjorn
> >>
> >>       On Fri, Oct 22, 2010 at 1:06 AM, Dan Burnett <dburnett@voxeo.com>
> >> wrote:
> >>       > Group,
> >>       >
> >>       > This is the first of the requirements to discuss and prioritize
> > based
> >> on
> >>       our
> >>       > ranking approach [1].
> >>       >
> >>       > This email is the beginning of a thread for questions,
> discussion,
> >> and
> >>       > opinions regarding our first draft of Requirement 29 [2].
> >>       >
> >>       > After our discussion and any modifications to the requirement,
> our
> >> goal is
> >>       > to prioritize this requirement as either "Should Address" or
> "For
> >> Future
> >>       > Consideration".
> >>       >
> >>       > -- dan
> >>       >
> >>       > [1]
> >>       > http://lists.w3.org/Archives/Public/public-xg-
> >> htmlspeech/2010Oct/0024.html
> >>       > [2]
> >>       >
> >>       http://lists.w3.org/Archives/Public/public-xg-
> >> htmlspeech/2010Oct/att-0001/sp
> >>       eech.html#r29 <http://lists.w3.org/Archives/Public/public-xg-
> >> htmlspeech/2010Oct/att-0001/sp eech.html#r29>
> >>       >
> >>       >
> >>
> >>
> >>       --
> >>       Bjorn Bringert
> >>       Google UK Limited, Registered Office: Belgrave House, 76
> Buckingham
> >>       Palace Road, London, SW1W 9TQ
> >>       Registered in England Number: 3977902
> >>
> >>
> >>
> >>
> >
> >
> >
> >
> >
> >
> >
>
>
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
>
>

Received on Tuesday, 26 October 2010 22:00:54 UTC