Re: Mozilla/Cisco API Proposal Revisions

On Tue, 19 Jul 2011, Anant Narayanan wrote:
> 
> Represent each camera as a separate MediaStreamTrack. User permission is 
> asked only once, and if granted, an app is able to freely switch between 
> multiple cameras by simply enabling/disabling the tracks as appropriate. 
> The exact way in which the app determines the function of each camera 
> (is it front-facing?) via the track object is TBD (suggestions are 
> welcome).

I don't think this works, because there's a potentially infinite number of 
(virtual) cameras that the user might want to pick. For example, the local 
desktop, or an explicit video file.


> Make the first argument to getUserMedia a JSON object. Options that will 
> definitely be present are "audio" and "video" boolean properties. All 
> remaining options are up for debate (and ties into the hints discussion 
> that follows).

I still don't think a JS object (I assume you mean a JS object and not a 
JSON object, which is actually a string) really works, but I'm open to 
proposals on that front -- can you describe what you think the JS object 
semantics should be, exactly?


> The initial proposal suggested that each MediaStreamTrack have its own 
> 'type' attribute that included codec level information on what the track 
> represented. Ian disagreed and wants web applications to be unaware of 
> such details, and thus a single 'kind' attribute which could be "audio" 
> or "video" suffices.

It's not so much that I think Web apps should be unaware, so much that 
there might not _be_ a type. For example, a media stream taking video 
straight from a camera to a local video monitor might never be encoded in 
any sense; it might just be a direct uncompressed video feed in a 
hardware-proprietary format that goes straight from the camera to the 
video hardware. Another example would be a local video camera that is to 
be sent over the wire to another UA -- the codec is something that will be 
decided only after the UA has negotiated a codec with the UA. And if there 
are two other UAs, there might be two different codecs -- which would you 
pretend the local media stream uses?


> Rename the 'type' attribute to 'kind', but instead of the only values 
> being "video"/"audio", include more information such as codec type.

I still don't think this makes any sense at this level. This should only 
be exposed once there _is_ a codec (for example after using a "record" 
feature or sending over the network).


> Rename onTypeChanged to onKindChanged, and create a new track every time 
> a change occurs.

Since the kind cannot change in the current model, I haven't added this.


> Make MediaStreamTrack interoperable with the corresponding track specs 
> from the WHATWG, but add a new DTMFTrack type (subclassed from 
> MediaStreamTrack) that would represent the DTMF signal.

Can you elaborate on how you envisage this working with RFC 4733? I don't 
really understand what the SDP flow for the DTMF stream would be.


> [...] if we're concerned about compatibility with existing web APIs, 
> pointed out that XHR has an explicit open() method too.

Note that XHR's open() is generally regarded as a misfeature. In general, 
most APIs do not use this model; e.g. new EventSource() and new 
WebSocket() both just open the connection; new Image(), new Audio(), and 
the corresponding ways to create <video>, <script> elements all just 
create objects that react as soon as you set their src="" attribute, 
elements that _do_ something, e.g. <marquee>, just do it as soon as 
they're created, and so on. The platform just generally gets on with it 
and doesn't wait to be told to do things.

(XHR is different because you have to give it a payload to start, and 
the original design of XHR was based on ActiveX where there was no way 
to give a constructor arguments.)


> Given this, I would highly recommend exposing an explicit connect() 
> method. Ian, do you have specific examples for why this approach is more 
> error-prone?

Doing something is always more error-prone than not doing something, 
pretty much by definition. I just don't understand why we would have 
anything to do here. Things should Just Work, especially when having them 
just work is as easy as in this case.


> We want to support the 1:many use-case, and even though the webapp could 
> technically do it on its own by creating multiple PeerConnection 
> objects; we propose providing an abstraction over it (such as the 
> PeerConnection factory) that makes this easy. It has been established 
> that ICE inherently cannot support more than 2 endpoints, but this is 
> not necessary to enable multi-user media streams.

Can you elaborate on how you see this kind of API working?

Is there any reason the browser should support this natively rather than 
having a library to do it? It seems to me that the difficulty of making a 
1:many library on top of PeerConnection is so minimal that there really 
isn't much gained from putting the responsibility on the UA. After all, 
the real complexity here isn't on the client side, it's on the server 
side, managing all the signalling channels.


> Matthew gave us an overview of what it meant to model PeerConnections 
> after either flows or sessions. Notably, he points out that there is 
> value in reusing session setup, thus, opening a new session should have 
> the same API as that of opening a flow that reuses an existing session.

Isn't that already how the PeerConnection API works? I don't understand 
what this means that's different from what we have now.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 25 July 2011 22:32:45 UTC