W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > July 2011

Re: About speech request initiation and reco element etc

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Tue, 05 Jul 2011 14:30:38 +0300
Message-ID: <4E12F5DE.2060306@helsinki.fi>
To: Satish S <satish@google.com>
CC: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
On 07/05/2011 02:16 PM, Satish S wrote:
> Agreed on the "opacity:0" and dropdown menu points, in both these cases
> it is the secondary UI (drop down or popup) which acts as a defense
> against clickjacking. But it is clear to the user that these actions
> happen when they interact with the page as opposed to a set of infobars
> popping up as soon as the page loads.

infobar doesn't pop up as soon as the page loads. It pops up when the 
feature is needed.
But I agree some kind of context depended, automatically opened menu 
over the page would be a like a better version of infobar.
Both have the same functionality, but the menu could popup close to the
place where the permission is required.

> (Another wild idea is that the UA could even have the <reco> element
> appear in the window chrome and not within the page.. in this case the
> <reco> markup just acts as an indicator to the UA that the page allows
> user initiated speech input. anyway, thats just an example of the things
> a UA could experiment)
> I see a strong and coherent story with a markup element as it provides
> both user initiated & webpage initiated models,
So does JS API.

  allows all JS APIs to
> hang of it in a clean fashion (with precedent being HTML5 audio and
> other elements) and UAs can use it to come up with more robust and
> secure models for such sensitive user information. It does not take
> anything away from the JS API but merely adds to it.

As I said
"Note, I'm not against <reco>, if we can find a reasonable security
model when it is used. "

So if Google is ok that user needs to give an explicit permission to
the page before activating speech recognition, then this one
problem is solved :)


> Cheers
> Satish
> On Tue, Jul 5, 2011 at 11:31 AM, Olli Pettay <Olli.Pettay@helsinki.fi
> <mailto:Olli.Pettay@helsinki.fi>> wrote:
>     On 07/05/2011 01:26 PM, Olli Pettay wrote:
>         On 07/05/2011 01:06 PM, Satish S wrote:
>             Hi Olli,
>             Here are the reasons I feel we should use a markup element for
>             recognition:
>             1. Even though click jacking is a problem, the UAs are in
>             control of
>             the element's presentation and can implement it in a secure
>             fashion. The file input dialog tackles this with an
>             additional popup
>             window
>         The additional popup window has nothing to do with the <input
>         type=file"> presentation on the page. You can easily just
>         add style="opacity: 0" and the presentation is hidden.
>             and for speech input UAs may tackle it in different ways. For
>             example:
>             * instead of a simple button which starts recording it could
>             open
>             a dropdown menu from which the user selects an option (e.g.
>             "start speaking", "select language", "enable hotkey" and so on).
>         This sounds already better, since this requires explicit permission
>         from user before the recognition is started. But still doesn't
>         require
>         any explicit element in the DOM tree. The dropdown menu approach
>         would work with or without <reco>.
>             * render as a simple button but on top of everything else, so
>             click jacking is impossible
>         This would be very strange. How would you define such button which
>         doesn't follow the CSS rules, when everything else in the page
>         is styled based on CSS. Especially when the button is in an iframe,
>         and the main page paints something over the iframe.
>         The iframe would have some way to paint over its parent?
>             * a naive implementation could also just bring up an infobar
>             similar to what the JS API would do.
>             But the key thing is that UAs can find what interface works best
>             for them. And for trusted sites (e.g. those which the user or
>             domain administrator has white listed) it could skip all of the
>             above and start reco on click.
>             2. A markup element allows all the JS APIs to hang of it.
>             This is
>             similar to how HTML5 does with the <audio> tag and web sites
>             that
>             want to play audio without a UI just create the <audio> tag in
>             javascript and call methods on it. For speech input if we have a
>             <reco> element then the recognition JS API could all be
>             methods of
>             this element and it presents a consistent picture to developers.
>         I can see some, though quite weak, use cases, for example"(un)mute
>         microphone" for <reco>. Quite often such things are done on OS
>         level.
>         And the microphone level could be shown on browser Chrome.
>         Note, I'm not against <reco>, if we can find a reasonable security
>         model when it is used.
>         Perhaps the dropdown menu could work well enough.
>         On mobile devices the UI could be different -
>         push-to-talk approach might work there.
>         In both cases user would give explicit permission to the
>         web page to start the recognition.
>     Of course, dropdown menu is effectively just a bit different UI for
>     the common infobar.
>         -Olli
>             Cheers
>             Satish
>             On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay
>             <Olli.Pettay@helsinki.fi <mailto:Olli.Pettay@helsinki.fi>
>             <mailto:Olli.Pettay@helsinki.__fi
>             <mailto:Olli.Pettay@helsinki.fi>>> wrote:
>             Hi all,
>             (I started to write this when I thought I could have some
>             reasonable
>             compromise between the privacy issues and the usability that
>             Google
>             wants. But I ended up into just more issues :/ But I'm
>             sending this
>             anyway.)
>             so far it hasn't become clear to me why we need <reco> element,
>             or special UI in <input> (like in current Chrome).
>             Because of click-jacking problem, the speech UI doesn't give
>             us any
>             better security or privacy handling than using pure scripting.
>             Also, I'm pretty sure web devs want to be able have their
>             own UI anyway.
>             So, for most cases Speech.getRequest()/____getRequestFor()
>             approach
>             should work just fine.
>             The problematic case is the Google Translate example.
>             (IMHO, it should ask permission from user before enabling
>             speech UI, similar to Google Maps. How is for example gender
>             recognition less privacy related than location?)
>             But, perhaps forthe default speech service, or other speech
>             services
>             which user *has* somehow *granted* permissions, permission
>             management
>             could be more flexible. What if, while handling user
>             interaction - say
>             trusted click event - implementation could immediately call the
>             successcallback passed to Speech.getRequest().
>             Implementation should
>             still show the UI that recognition is on, and the UI should
>             have some
>             way to abort the recognition without giving any data to the
>             web page.
>             Also, if the user is concerned about the privacy, (s)he
>             would never
>             grant any automatic permissions to speech services, and
>             would have
>             to always give the permission when a page first time after
>             (re-)loading
>             tries to use speech services.
>             Effectively in Chrome case this might mean that at some
>             point the
>             browser would ask permission to use the default speech
>             service, and
>             after that any click on a web page could start recognition.
>             Hmmm... this is still pretty scary. And even wrong. We're
>             dealing with
>             several different permissions. At least a) is it ok to send
>             user's
>             speech data to service X, b) is it ok that web app Y uses speech
>             services, c) is it ok that web app Y uses service X.
>             a) allows service X to do at least gender recognition, so
>             there is a
>             clear privacy data leak to X.
>             b) is close to the issues related to current implementation
>             in Chrome.
>             Is it ok that whenever user clicks something in a page (any
>             web page!),
>             the page may get some recognition results.
>             c) if I need to give my social security number to web site Y, is
>             it ok to use speech service X to recognize the number.
>             Usually it may be ok to the user to give some data to
>             service X, but
>             perhaps ssn is not such data.
>             ...so, my trial to come up with a solution for privacy
>             handling which
>             would be ok to Google hasn't yet succeeded.
>             (It is not quite clear to me why the privacy handling of
>             capturing API
>             or Geolocation API is ok to Google, but for speech handling
>             something
>             else is needed.)
>             -Olli
Received on Tuesday, 5 July 2011 11:31:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC