- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Tue, 05 Jul 2011 13:31:47 +0300
- To: Satish S <satish@google.com>
- CC: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
On 07/05/2011 01:26 PM, Olli Pettay wrote: > On 07/05/2011 01:06 PM, Satish S wrote: >> Hi Olli, >> >> Here are the reasons I feel we should use a markup element for >> recognition: >> >> 1. Even though click jacking is a problem, the UAs are in control of >> the element's presentation and can implement it in a secure >> fashion. The file input dialog tackles this with an additional popup >> window > The additional popup window has nothing to do with the <input > type=file"> presentation on the page. You can easily just > add style="opacity: 0" and the presentation is hidden. > > >> and for speech input UAs may tackle it in different ways. For >> example: >> * instead of a simple button which starts recording it could open >> a dropdown menu from which the user selects an option (e.g. >> "start speaking", "select language", "enable hotkey" and so on). > This sounds already better, since this requires explicit permission > from user before the recognition is started. But still doesn't require > any explicit element in the DOM tree. The dropdown menu approach > would work with or without <reco>. > > >> * render as a simple button but on top of everything else, so >> click jacking is impossible > This would be very strange. How would you define such button which > doesn't follow the CSS rules, when everything else in the page > is styled based on CSS. Especially when the button is in an iframe, > and the main page paints something over the iframe. > The iframe would have some way to paint over its parent? > > > > > >> * a naive implementation could also just bring up an infobar >> similar to what the JS API would do. >> But the key thing is that UAs can find what interface works best >> for them. And for trusted sites (e.g. those which the user or >> domain administrator has white listed) it could skip all of the >> above and start reco on click. >> 2. A markup element allows all the JS APIs to hang of it. This is >> similar to how HTML5 does with the <audio> tag and web sites that >> want to play audio without a UI just create the <audio> tag in >> javascript and call methods on it. For speech input if we have a >> <reco> element then the recognition JS API could all be methods of >> this element and it presents a consistent picture to developers. > I can see some, though quite weak, use cases, for example"(un)mute > microphone" for <reco>. Quite often such things are done on OS level. > And the microphone level could be shown on browser Chrome. > > Note, I'm not against <reco>, if we can find a reasonable security > model when it is used. > Perhaps the dropdown menu could work well enough. > On mobile devices the UI could be different - > push-to-talk approach might work there. > In both cases user would give explicit permission to the > web page to start the recognition. Of course, dropdown menu is effectively just a bit different UI for the common infobar. > > > > > -Olli > >> >> Cheers >> Satish >> >> On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay <Olli.Pettay@helsinki.fi >> <mailto:Olli.Pettay@helsinki.fi>> wrote: >> >> Hi all, >> >> (I started to write this when I thought I could have some reasonable >> compromise between the privacy issues and the usability that Google >> wants. But I ended up into just more issues :/ But I'm sending this >> anyway.) >> >> so far it hasn't become clear to me why we need <reco> element, >> or special UI in <input> (like in current Chrome). >> Because of click-jacking problem, the speech UI doesn't give us any >> better security or privacy handling than using pure scripting. >> Also, I'm pretty sure web devs want to be able have their own UI anyway. >> >> So, for most cases Speech.getRequest()/__getRequestFor() approach >> should work just fine. >> The problematic case is the Google Translate example. >> (IMHO, it should ask permission from user before enabling >> speech UI, similar to Google Maps. How is for example gender >> recognition less privacy related than location?) >> >> But, perhaps forthe default speech service, or other speech services >> which user *has* somehow *granted* permissions, permission management >> could be more flexible. What if, while handling user interaction - say >> trusted click event - implementation could immediately call the >> successcallback passed to Speech.getRequest(). Implementation should >> still show the UI that recognition is on, and the UI should have some >> way to abort the recognition without giving any data to the web page. >> Also, if the user is concerned about the privacy, (s)he would never >> grant any automatic permissions to speech services, and would have >> to always give the permission when a page first time after (re-)loading >> tries to use speech services. >> Effectively in Chrome case this might mean that at some point the >> browser would ask permission to use the default speech service, and >> after that any click on a web page could start recognition. >> >> Hmmm... this is still pretty scary. And even wrong. We're dealing with >> several different permissions. At least a) is it ok to send user's >> speech data to service X, b) is it ok that web app Y uses speech >> services, c) is it ok that web app Y uses service X. >> >> >> a) allows service X to do at least gender recognition, so there is a >> clear privacy data leak to X. >> >> b) is close to the issues related to current implementation in Chrome. >> Is it ok that whenever user clicks something in a page (any web page!), >> the page may get some recognition results. >> >> c) if I need to give my social security number to web site Y, is >> it ok to use speech service X to recognize the number. >> Usually it may be ok to the user to give some data to service X, but >> perhaps ssn is not such data. >> >> >> ...so, my trial to come up with a solution for privacy handling which >> would be ok to Google hasn't yet succeeded. >> >> >> (It is not quite clear to me why the privacy handling of capturing API >> or Geolocation API is ok to Google, but for speech handling something >> else is needed.) >> >> >> -Olli >> >> > > >
Received on Tuesday, 5 July 2011 10:32:13 UTC