W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > July 2011

Re: About speech request initiation and reco element etc

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Tue, 05 Jul 2011 13:26:52 +0300
Message-ID: <4E12E6EC.1060904@helsinki.fi>
To: Satish S <satish@google.com>
CC: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
On 07/05/2011 01:06 PM, Satish S wrote:
> Hi Olli,
> Here are the reasons I feel we should use a markup element for recognition:
>  1. Even though click jacking is a problem, the UAs are in control of
>     the element's presentation and can implement it in a secure
>     fashion. The file input dialog tackles this with an additional popup
>     window
The additional popup window has nothing to do with the <input 
type=file"> presentation on the page. You can easily just
add style="opacity: 0" and the presentation is hidden.

> and for speech input UAs may tackle it in different ways. For
>     example:
>       * instead of a simple button which starts recording it could open
>         a dropdown menu from which the user selects an option (e.g.
>         "start speaking", "select language", "enable hotkey" and so on).
This sounds already better, since this requires explicit permission
from user before the recognition is started. But still doesn't require
any explicit element in the DOM tree. The dropdown menu approach
would work with or without <reco>.

>       * render as a simple button but on top of everything else, so
>         click jacking is impossible
This would be very strange. How would you define such button which
doesn't follow the CSS rules, when everything else in the page
is styled based on CSS. Especially when the button is in an iframe,
and the main page paints something over the iframe.
The iframe would have some way to paint over its parent?

>       * a naive implementation could also just bring up an infobar
>         similar to what the JS API would do.
>         But the key thing is that UAs can find what interface works best
>         for them. And for trusted sites (e.g. those which the user or
>         domain administrator has white listed) it could skip all of the
>         above and start reco on click.
>  2. A markup element allows all the JS APIs to hang of it. This is
>     similar to how HTML5 does with the <audio> tag and web sites that
>     want to play audio without a UI just create the <audio> tag in
>     javascript and call methods on it. For speech input if we have a
>     <reco> element then the recognition JS API could all be methods of
>     this element and it presents a consistent picture to developers.
I can see some, though quite weak, use cases, for example"(un)mute
microphone"  for <reco>. Quite often such things are done on OS level.
And the microphone level could be shown on browser Chrome.

Note, I'm not against <reco>, if we can find a reasonable security
model when it is used.
Perhaps the dropdown menu could work well enough.
On mobile devices the UI could be different -
push-to-talk approach might work there.
In both cases user would give explicit permission to the
web page to start the recognition.


> Cheers
> Satish
> On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay <Olli.Pettay@helsinki.fi
> <mailto:Olli.Pettay@helsinki.fi>> wrote:
>     Hi all,
>     (I started to write this when I thought I could have some reasonable
>     compromise between the privacy issues and the usability that Google
>     wants. But I ended up into just more issues :/ But I'm sending this
>     anyway.)
>     so far it hasn't become clear to me why we need <reco> element,
>     or special UI in <input> (like in current Chrome).
>     Because of click-jacking problem, the speech UI doesn't give us any
>     better security or privacy handling than using pure scripting.
>     Also, I'm pretty sure web devs want to be able have their own UI anyway.
>     So, for most cases Speech.getRequest()/__getRequestFor() approach
>     should work just fine.
>     The problematic case is the Google Translate example.
>     (IMHO, it should ask permission from user before enabling
>     speech UI, similar to Google Maps. How is for example gender
>     recognition less privacy related than location?)
>     But, perhaps forthe  default speech service, or other speech services
>     which user *has* somehow *granted* permissions, permission management
>     could be more flexible. What if, while handling user interaction - say
>     trusted click event - implementation could immediately call the
>     successcallback passed to Speech.getRequest(). Implementation should
>     still show the UI that recognition is on, and the UI should have some
>     way to abort the recognition without giving any data to the web page.
>     Also, if the user is concerned about the privacy, (s)he would never
>     grant any automatic permissions to speech services, and would have
>     to always give the permission when a page first time after (re-)loading
>     tries to use speech services.
>     Effectively in Chrome case this might mean that at some point the
>     browser would ask permission to use the default speech service, and
>     after that any click on a web page could start recognition.
>     Hmmm... this is still pretty scary. And even wrong. We're dealing with
>     several different permissions. At least a) is it ok to send user's
>     speech data to service X, b) is it ok that web app Y uses speech
>     services, c) is it ok that web app Y uses service X.
>     a) allows service X to do at least gender recognition, so there is a
>     clear privacy data leak to X.
>     b) is close to the issues related to current implementation in Chrome.
>     Is it ok that whenever user clicks something in a page (any web page!),
>     the page may get some recognition results.
>     c) if I need to give my social security number to web site Y, is
>     it ok to use speech service X to recognize the number.
>     Usually it may be ok to the user to give some data to service X, but
>     perhaps ssn is not such data.
>     ...so, my trial to come up with a solution for privacy handling which
>     would be ok to Google hasn't yet succeeded.
>     (It is not quite clear to me why the privacy handling of capturing API
>     or Geolocation API is ok to Google, but for speech handling something
>     else is needed.)
>     -Olli
Received on Tuesday, 5 July 2011 10:27:19 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC