RE: UI for enabling webcam use from untrusted content from Ian Hickson on 2009-12-11 (public-device-apis@w3.org from December 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 11 Dec 2009 16:40:11 +0000 (UTC)
To: "Tran, Dzung D" <dzung.d.tran@intel.com>, Nick Lothian <nlothian@educationau.edu.au>, Kenton Varda <kenton@google.com>, Anssi Kostiainen <anssi.kostiainen@nokia.com>
Cc: "public-device-apis@w3.org" <public-device-apis@w3.org>
Message-ID: <Pine.LNX.4.62.0912111623090.31755@hixie.dreamhostps.com>
On Wed, 9 Dec 2009, Tran, Dzung D wrote:
> 
> I think this is orthogonal issue to the capture API. The same problem 
> applies to any type of data such as credit card info, login password, 
> ..etc.

Indeed, but we've solved it for those cases -- users have to actively go 
out of their way to type in their credit card number. How do we get users 
to actively go out of their way to enable their camera?


> If the user got hi-jacked by some evil corp, then how would the user 
> knows to turn off his camera or provide his credit card info.

The problem isn't when the user is hijacked, the problem is when the user 
is actively visiting a site he doesn't trust. For example, a user visits a 
porn site while naked. How do we prevent the site from turning on the 
camera and making the user the subject of the next video on the site?


> UI could be as simple as the <video> tag: 
> 
>              +---------------------------+
>              |                           |
>              |                           | 
>              |     ( ) Video Chat        |
>              |          with Mom         |
>              |                           |
>              |                           |
>              +---------------------------+
>              | ( ) stop     * recording..|
>              +---------------------------+
> 
> Another issue is what happen when it is covered by another application 
> or hidden by a TAB page.

Indeed -- this wouldn't work because of click-jacking.


On Thu, 10 Dec 2009, Nick Lothian wrote:
>
> Is there a difference between "displaying the video viewfinder to the 
> user" and "recording and uploading video to a remote website"?

Strictly speaking, yes... but I presume you mean in terms of the UI?


> Should there be? It may be that I just don't like the "recording" 
> terminology.
> 
> With local HTML5 application there are a number of scenarios where local 
> use of the camera and/or microphone is useful, but they aren't strictly 
> "recording" - ie, the video is manipulated locally and then thrown away.
> 
> I think asking for device permission is still a requirement in these 
> cases, but there is a difference between this and when the video is sent 
> to a remote site.

I think once we've given a site access to the bits coming from the camera, 
we've got no way of knowing what the site is doing with the data, so we 
have to treat them as equivalent.


On Wed, 9 Dec 2009, Kenton Varda wrote:
> 
> Whatever UI we end up with, I'd like to strongly suggest that it have 
> these important properties:
> 
> A) It should be extensible to any device, including things we haven't 
> thought of.  This will allow the market to innovate and come up with new 
> and interesting devices without having to get support from us.  (Take 
> "us" to mean either w3c or browser vendors.)

To some extent, yes. To some extent, this is impossible, though. For 
example, for some devices (video in particular) it makes sense to expose 
the device to the script as a stream of data, and in many cases it might 
make sense for the API to expose nothing in terms of control over the 
device (e.g. the UA or hardware can handle focus control). For other 
devices (e.g. a USB fishtank that can just be turned on or off) there's no 
return data; it's not a stream, and the only thing that the API would 
expose is a single boolean setter.

I think it might make sense to user the same kind of UI for both, but I 
don't think it'd make sense to use the same API, and so the market can't 
automatically innovate without discussion.


> B) It should support the case where the user has multiple webcams (or 
> whatever other device) connected.  Note that since this would be for 
> power users, the interface doesn't have to make this case easy and 
> intuitive, but it should be *possible*.
> 
> C) It should support "virtual" devices which are actually implemented by 
> other software (or web apps).  For example, I should be able to write a 
> piece of software which exposes a virtual web cam that just plays a 
> movie fed from a file, or a piece of software that merges two camera 
> feeds into one with the images side-by-side.  Again, this would be for 
> power users, so does not necessarily have to be easy, just possible.
> 
> Note that if the UI supports (B), then (C) should presumably be 
> automatic.

Agreed.


> I'd also like to suggest a design #5 (sorry, no ASCII art):  The web 
> page could have a box which is a "socket" for a web cam (or whatever) 
> device.  If you click on (or maybe hover over) the socket, the browser 
> will pop up a box suggesting devices that could be connected to this 
> socket, but advanced users should also be able to drag and drop devices 
> other than the suggested ones.
> 
> The metaphor here is of hooking up devices physically.  If I want my 
> iPod to be able to play music to my head phones, I plug my headphones 
> into the iPod's audio output socket.  If I'd rather that it play to my 
> stereo, I hook that up instead.  Everyone understands how this works in 
> the physical world. Can we extend this to virtual space as well?
>
> A file-access <input> allows you to select a file to open.  Similarly, a
> webcam <input> should allow you to select a web cam to connect to.  Not only
> does this create the possibility of having multiple cameras, but I think it
> forces the user to think about what is happening.  A dialog box that simply
> says "OK to use web cam?" will probably get reflexive "yes" clicks, but one
> saying "Choose a web cam to use: [list]" will, I think, force the user to
> realize what they're doing.  And if they don't want to think about it,
> they'll probably click "cancel", thus avoiding giving away capabilities they
> didn't intend.

That makes sense to me.


 5. Device Well

   +-----------------------------------+
   | [] [] [_http://www.example.c_] [] |
   | +-------------------------------+ |
   | |                               | |   Clicking the well shows:
   | |  To start the video chat,     | |
   | |  activate your camera:        | |     ## DEVICES ##
   | |   _______                     | |     | () Webcam |
   | |  | click |                    | |     | () Mic    |
   | |  |_______|                    | |     +-----------+
   | |                               | |
   | +-------------------------------+ |
   +-----------------------------------+


On Thu, 10 Dec 2009, Anssi Kostiainen wrote:
> 
> To make this more concrete (also works with #2 and #3) the UA could -- 
> once the box is popped up -- show a live viewfinder preview in the box 
> beside the device icon once the user hovers on top of it. The preview 
> image should indicate that the device is not yet "recording" e.g. by 
> graying out the image. This would enable the user to see the actual 
> camera output prior to accepting the capture request which should make 
> the potential action more concrete. Additionally, in a multi-cam 
> scenario (e.g. many mobile devices have multiple cams already today) 
> this would help the user to pick the right cam based on visual 
> observation only (thus no need to know the actual name of a particular 
> camera such as primary, secondary etc.).

Indeed.


So, now, moving from this UI to what it requires in the API/markup:

 * We need an element that represents the device well.
 * The element should have a way to filter devices to just those that are 
   desired (e.g. "cameras", "microphones")
 * When a device is selected, the element should fire some script and
   hand it an object that represents the device, e.g. for video it might
   just be a Stream object (similar to File).

I don't think it makes sense to reuse <input> for this, since we wouldn't 
want any of this to interact with form submission.

I guess this means an addition to HTML. I'll come up with something.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 11 December 2009 16:40:39 UTC