Re: About speech request initiation and reco element etc from Satish S on 2011-07-05 (public-xg-htmlspeech@w3.org from July 2011)

From: Satish S <satish@google.com>
Date: Tue, 5 Jul 2011 12:16:43 +0100
To: Olli@pettay.fi
Cc: public-xg-htmlspeech@w3.org, Bjorn Bringert <bringert@google.com>
Message-ID: <CAHZf7Rn29tquwWBFV89w1WMiaU4DnyYMy4szcGkwKGa6c8g6RQ@mail.gmail.com>
Agreed on the "opacity:0" and dropdown menu points, in both these cases it
is the secondary UI (drop down or popup) which acts as a defense against
clickjacking. But it is clear to the user that these actions happen when
they interact with the page as opposed to a set of infobars popping up as
soon as the page loads.

(Another wild idea is that the UA could even have the <reco> element appear
in the window chrome and not within the page.. in this case the <reco>
markup just acts as an indicator to the UA that the page allows user
initiated speech input. anyway, thats just an example of the things a UA
could experiment)

I see a strong and coherent story with a markup element as it provides both
user initiated & webpage initiated models, allows all JS APIs to hang of it
in a clean fashion (with precedent being HTML5 audio and other elements) and
UAs can use it to come up with more robust and secure models for such
sensitive user information. It does not take anything away from the JS API
but merely adds to it.

Cheers
Satish


On Tue, Jul 5, 2011 at 11:31 AM, Olli Pettay <Olli.Pettay@helsinki.fi>wrote:

> On 07/05/2011 01:26 PM, Olli Pettay wrote:
>
>> On 07/05/2011 01:06 PM, Satish S wrote:
>>
>>> Hi Olli,
>>>
>>> Here are the reasons I feel we should use a markup element for
>>> recognition:
>>>
>>> 1. Even though click jacking is a problem, the UAs are in control of
>>> the element's presentation and can implement it in a secure
>>> fashion. The file input dialog tackles this with an additional popup
>>> window
>>>
>> The additional popup window has nothing to do with the <input
>> type=file"> presentation on the page. You can easily just
>> add style="opacity: 0" and the presentation is hidden.
>>
>>
>>  and for speech input UAs may tackle it in different ways. For
>>> example:
>>> * instead of a simple button which starts recording it could open
>>> a dropdown menu from which the user selects an option (e.g.
>>> "start speaking", "select language", "enable hotkey" and so on).
>>>
>> This sounds already better, since this requires explicit permission
>> from user before the recognition is started. But still doesn't require
>> any explicit element in the DOM tree. The dropdown menu approach
>> would work with or without <reco>.
>>
>>
>>  * render as a simple button but on top of everything else, so
>>> click jacking is impossible
>>>
>> This would be very strange. How would you define such button which
>> doesn't follow the CSS rules, when everything else in the page
>> is styled based on CSS. Especially when the button is in an iframe,
>> and the main page paints something over the iframe.
>> The iframe would have some way to paint over its parent?
>>
>>
>>
>>
>>
>>  * a naive implementation could also just bring up an infobar
>>> similar to what the JS API would do.
>>> But the key thing is that UAs can find what interface works best
>>> for them. And for trusted sites (e.g. those which the user or
>>> domain administrator has white listed) it could skip all of the
>>> above and start reco on click.
>>> 2. A markup element allows all the JS APIs to hang of it. This is
>>> similar to how HTML5 does with the <audio> tag and web sites that
>>> want to play audio without a UI just create the <audio> tag in
>>> javascript and call methods on it. For speech input if we have a
>>> <reco> element then the recognition JS API could all be methods of
>>> this element and it presents a consistent picture to developers.
>>>
>> I can see some, though quite weak, use cases, for example"(un)mute
>> microphone" for <reco>. Quite often such things are done on OS level.
>> And the microphone level could be shown on browser Chrome.
>>
>> Note, I'm not against <reco>, if we can find a reasonable security
>> model when it is used.
>> Perhaps the dropdown menu could work well enough.
>> On mobile devices the UI could be different -
>> push-to-talk approach might work there.
>> In both cases user would give explicit permission to the
>> web page to start the recognition.
>>
>
> Of course, dropdown menu is effectively just a bit different UI for
> the common infobar.
>
>
>
>
>
>
>>
>>
>>
>> -Olli
>>
>>
>>> Cheers
>>> Satish
>>>
>>> On Mon, Jul 4, 2011 at 11:40 AM, Olli Pettay <Olli.Pettay@helsinki.fi
>>> <mailto:Olli.Pettay@helsinki.**fi <Olli.Pettay@helsinki.fi>>> wrote:
>>>
>>> Hi all,
>>>
>>> (I started to write this when I thought I could have some reasonable
>>> compromise between the privacy issues and the usability that Google
>>> wants. But I ended up into just more issues :/ But I'm sending this
>>> anyway.)
>>>
>>> so far it hasn't become clear to me why we need <reco> element,
>>> or special UI in <input> (like in current Chrome).
>>> Because of click-jacking problem, the speech UI doesn't give us any
>>> better security or privacy handling than using pure scripting.
>>> Also, I'm pretty sure web devs want to be able have their own UI anyway.
>>>
>>> So, for most cases Speech.getRequest()/__**getRequestFor() approach
>>> should work just fine.
>>> The problematic case is the Google Translate example.
>>> (IMHO, it should ask permission from user before enabling
>>> speech UI, similar to Google Maps. How is for example gender
>>> recognition less privacy related than location?)
>>>
>>> But, perhaps forthe default speech service, or other speech services
>>> which user *has* somehow *granted* permissions, permission management
>>> could be more flexible. What if, while handling user interaction - say
>>> trusted click event - implementation could immediately call the
>>> successcallback passed to Speech.getRequest(). Implementation should
>>> still show the UI that recognition is on, and the UI should have some
>>> way to abort the recognition without giving any data to the web page.
>>> Also, if the user is concerned about the privacy, (s)he would never
>>> grant any automatic permissions to speech services, and would have
>>> to always give the permission when a page first time after (re-)loading
>>> tries to use speech services.
>>> Effectively in Chrome case this might mean that at some point the
>>> browser would ask permission to use the default speech service, and
>>> after that any click on a web page could start recognition.
>>>
>>> Hmmm... this is still pretty scary. And even wrong. We're dealing with
>>> several different permissions. At least a) is it ok to send user's
>>> speech data to service X, b) is it ok that web app Y uses speech
>>> services, c) is it ok that web app Y uses service X.
>>>
>>>
>>> a) allows service X to do at least gender recognition, so there is a
>>> clear privacy data leak to X.
>>>
>>> b) is close to the issues related to current implementation in Chrome.
>>> Is it ok that whenever user clicks something in a page (any web page!),
>>> the page may get some recognition results.
>>>
>>> c) if I need to give my social security number to web site Y, is
>>> it ok to use speech service X to recognize the number.
>>> Usually it may be ok to the user to give some data to service X, but
>>> perhaps ssn is not such data.
>>>
>>>
>>> ...so, my trial to come up with a solution for privacy handling which
>>> would be ok to Google hasn't yet succeeded.
>>>
>>>
>>> (It is not quite clear to me why the privacy handling of capturing API
>>> or Geolocation API is ok to Google, but for speech handling something
>>> else is needed.)
>>>
>>>
>>> -Olli
>>>
>>>
>>>
>>
>>
>>
>
Received on Tuesday, 5 July 2011 11:17:10 UTC