W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > July 2011

Re: About speech request initiation and reco element etc

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Tue, 05 Jul 2011 13:31:47 +0300
Message-ID: <4E12E813.4030902@helsinki.fi>
To: Satish S <satish@google.com>
CC: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
On 07/05/2011 01:26 PM, Olli Pettay wrote:
> On 07/05/2011 01:06 PM, Satish S wrote:
>> Hi Olli,
>> Here are the reasons I feel we should use a markup element for
>> recognition:
>> 1. Even though click jacking is a problem, the UAs are in control of
>> the element's presentation and can implement it in a secure
>> fashion. The file input dialog tackles this with an additional popup
>> window
> The additional popup window has nothing to do with the <input
> type=file"> presentation on the page. You can easily just
> add style="opacity: 0" and the presentation is hidden.
>> and for speech input UAs may tackle it in different ways. For
>> example:
>> * instead of a simple button which starts recording it could open
>> a dropdown menu from which the user selects an option (e.g.
>> "start speaking", "select language", "enable hotkey" and so on).
> This sounds already better, since this requires explicit permission
> from user before the recognition is started. But still doesn't require
> any explicit element in the DOM tree. The dropdown menu approach
> would work with or without <reco>.
>> * render as a simple button but on top of everything else, so
>> click jacking is impossible
> This would be very strange. How would you define such button which
> doesn't follow the CSS rules, when everything else in the page
> is styled based on CSS. Especially when the button is in an iframe,
> and the main page paints something over the iframe.
> The iframe would have some way to paint over its parent?
>> * a naive implementation could also just bring up an infobar
>> similar to what the JS API would do.
>> But the key thing is that UAs can find what interface works best
>> for them. And for trusted sites (e.g. those which the user or
>> domain administrator has white listed) it could skip all of the
>> above and start reco on click.
>> 2. A markup element allows all the JS APIs to hang of it. This is
>> similar to how HTML5 does with the <audio> tag and web sites that
>> want to play audio without a UI just create the <audio> tag in
>> javascript and call methods on it. For speech input if we have a
>> <reco> element then the recognition JS API could all be methods of
>> this element and it presents a consistent picture to developers.
> I can see some, though quite weak, use cases, for example"(un)mute
> microphone" for <reco>. Quite often such things are done on OS level.
> And the microphone level could be shown on browser Chrome.
> Note, I'm not against <reco>, if we can find a reasonable security
> model when it is used.
> Perhaps the dropdown menu could work well enough.
> On mobile devices the UI could be different -
> push-to-talk approach might work there.
> In both cases user would give explicit permission to the
> web page to start the recognition.

Of course, dropdown menu is effectively just a bit different UI for
the common infobar.

> -Olli
>> Cheers
>> Satish
>> On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay <Olli.Pettay@helsinki.fi
>> <mailto:Olli.Pettay@helsinki.fi>> wrote:
>> Hi all,
>> (I started to write this when I thought I could have some reasonable
>> compromise between the privacy issues and the usability that Google
>> wants. But I ended up into just more issues :/ But I'm sending this
>> anyway.)
>> so far it hasn't become clear to me why we need <reco> element,
>> or special UI in <input> (like in current Chrome).
>> Because of click-jacking problem, the speech UI doesn't give us any
>> better security or privacy handling than using pure scripting.
>> Also, I'm pretty sure web devs want to be able have their own UI anyway.
>> So, for most cases Speech.getRequest()/__getRequestFor() approach
>> should work just fine.
>> The problematic case is the Google Translate example.
>> (IMHO, it should ask permission from user before enabling
>> speech UI, similar to Google Maps. How is for example gender
>> recognition less privacy related than location?)
>> But, perhaps forthe default speech service, or other speech services
>> which user *has* somehow *granted* permissions, permission management
>> could be more flexible. What if, while handling user interaction - say
>> trusted click event - implementation could immediately call the
>> successcallback passed to Speech.getRequest(). Implementation should
>> still show the UI that recognition is on, and the UI should have some
>> way to abort the recognition without giving any data to the web page.
>> Also, if the user is concerned about the privacy, (s)he would never
>> grant any automatic permissions to speech services, and would have
>> to always give the permission when a page first time after (re-)loading
>> tries to use speech services.
>> Effectively in Chrome case this might mean that at some point the
>> browser would ask permission to use the default speech service, and
>> after that any click on a web page could start recognition.
>> Hmmm... this is still pretty scary. And even wrong. We're dealing with
>> several different permissions. At least a) is it ok to send user's
>> speech data to service X, b) is it ok that web app Y uses speech
>> services, c) is it ok that web app Y uses service X.
>> a) allows service X to do at least gender recognition, so there is a
>> clear privacy data leak to X.
>> b) is close to the issues related to current implementation in Chrome.
>> Is it ok that whenever user clicks something in a page (any web page!),
>> the page may get some recognition results.
>> c) if I need to give my social security number to web site Y, is
>> it ok to use speech service X to recognize the number.
>> Usually it may be ok to the user to give some data to service X, but
>> perhaps ssn is not such data.
>> ...so, my trial to come up with a solution for privacy handling which
>> would be ok to Google hasn't yet succeeded.
>> (It is not quite clear to me why the privacy handling of capturing API
>> or Geolocation API is ok to Google, but for speech handling something
>> else is needed.)
>> -Olli
Received on Tuesday, 5 July 2011 10:32:13 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC