Re: About speech request initiation and reco element etc from Satish S on 2011-07-05 (public-xg-htmlspeech@w3.org from July 2011)

From: Satish S <satish@google.com>
Date: Tue, 5 Jul 2011 11:06:19 +0100
To: Olli@pettay.fi
Cc: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
Message-ID: <CAHZf7RmxYK3970FEnsK56d3s0++CnWZMT=3G8+_Ap4=a7pSRww@mail.gmail.com>
Hi Olli,

Here are the reasons I feel we should use a markup element for recognition:

   1. Even though click jacking is a problem, the UAs are in control of the
   element's presentation and can implement it in a secure fashion. The file
   input dialog tackles this with an additional popup window and for speech
   input UAs may tackle it in different ways. For example:
      - instead of a simple button which starts recording it could open a
      dropdown menu from which the user selects an option (e.g. "start
speaking",
      "select language", "enable hotkey" and so on).
      - render as a simple button but on top of everything else, so click
      jacking is impossible
      - a naive implementation could also just bring up an infobar similar
      to what the JS API would do.
      But the key thing is that UAs can find what interface works best for
      them. And for trusted sites (e.g. those which the user or domain
      administrator has white listed) it could skip all of the above and start
      reco on click.
   2. A markup element allows all the JS APIs to hang of it. This is similar
   to how HTML5 does with the <audio> tag and web sites that want to play audio
   without a UI just create the <audio> tag in javascript and call methods on
   it. For speech input if we have a <reco> element then the recognition JS API
   could all be methods of this element and it presents a consistent picture to
   developers.

Cheers
Satish

On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay <Olli.Pettay@helsinki.fi>wrote:

> Hi all,
>
> (I started to write this when I thought I could have some reasonable
> compromise between the privacy issues and the usability that Google
> wants. But I ended up into just more issues :/ But I'm sending this
> anyway.)
>
> so far it hasn't become clear to me why we need <reco> element,
> or special UI in <input> (like in current Chrome).
> Because of click-jacking problem, the speech UI doesn't give us any
> better security or privacy handling than using pure scripting.
> Also, I'm pretty sure web devs want to be able have their own UI anyway.
>
> So, for most cases Speech.getRequest()/**getRequestFor() approach should
> work just fine.
> The problematic case is the Google Translate example.
> (IMHO, it should ask permission from user before enabling
> speech UI, similar to Google Maps. How is for example gender
> recognition less privacy related than location?)
>
> But, perhaps forthe  default speech service, or other speech services
> which user *has* somehow *granted* permissions, permission management
> could be more flexible. What if, while handling user interaction - say
> trusted click event - implementation could immediately call the
> successcallback passed to Speech.getRequest(). Implementation should
> still show the UI that recognition is on, and the UI should have some
> way to abort the recognition without giving any data to the web page.
> Also, if the user is concerned about the privacy, (s)he would never
> grant any automatic permissions to speech services, and would have
> to always give the permission when a page first time after (re-)loading
> tries to use speech services.
> Effectively in Chrome case this might mean that at some point the
> browser would ask permission to use the default speech service, and
> after that any click on a web page could start recognition.
>
> Hmmm... this is still pretty scary. And even wrong. We're dealing with
> several different permissions. At least a) is it ok to send user's
> speech data to service X, b) is it ok that web app Y uses speech
> services, c) is it ok that web app Y uses service X.
>
>
> a) allows service X to do at least gender recognition, so there is a
> clear privacy data leak to X.
>
> b) is close to the issues related to current implementation in Chrome.
> Is it ok that whenever user clicks something in a page (any web page!),
> the page may get some recognition results.
>
> c) if I need to give my social security number to web site Y, is
> it ok to use speech service X to recognize the number.
> Usually it may be ok to the user to give some data to service X, but
> perhaps ssn is not such data.
>
>
> ...so, my trial to come up with a solution for privacy handling which
> would be ok to Google hasn't yet succeeded.
>
>
> (It is not quite clear to me why the privacy handling of capturing API
> or Geolocation API is ok to Google, but for speech handling something
> else is needed.)
>
>
> -Olli
>
>
Received on Tuesday, 5 July 2011 10:06:45 UTC