(should be) Re: Initial notes on MS proposal

[Quoted material is from Eric Rescorla <ekr@rtfm.com>]

I've taken a handful of quotes from the message posted by Eric Rescorla and tried to respond to them... obviously a complete response would be much longer, and I don't believe it is necessary at this time. I have also taken a lot of liberty with "we" here... these views are mine, and certainly shared by at least one other person who contributed to the "Microsoft Proposal", but may not reflect any sort of consensus among the authors of that proposal, much less the corporation for which we work.

---

"Like any all-new design, this API has the significant advantage (which the authors don't mention) of architectural cleanliness. The existing API is a compromise between a number of different architectural notions and like any hybrid proposals has points of ugliness where those proposals come into contact with each other (especially in the area of SDP.) However, when we actually look at functionality rather than elegance, the advantages of an all-new design---not only one which is largely not based on preexisting technologies but one which involves discarding most of the existing work on WebRTC itself---start to look fairly thin."

We agree. The existing API is a compromise because it is a hybrid between a bunch of things, and every single one of them (SDP O/A, ICE, RTP, DTLS/SRTP) has now been twisted beyond recognition. (SDP Offer/Answer now has a way for the endpoints to modify offers and answers before processing, ICE now does trickle candidates, RTP now does multiplexing, DTLS/SRTP now has identity assertions). 

Given that none of them remains compatible with its origin, why *not* start with a clean(er) slate? Or at least go through a serious cleanup and get rid of what we all know is ugly in the current specification.

---

"The interoperability argument is similarly weakly supported. Given that JSEP is based on existing VoIP technologies, it seems likely that it is easier to make it interoperate with existing endpoints since it's not first necessary to implement those technologies (principally
SDP) in JavaScript before you can even try to interoperate. The idea here seems to be that it will be easier to accomodate existing noncompliant endpoints if you can adapt your Web application on the fly, but given the significant entry barrier to interoperating at all, this seems like an argument that needs rather more support than MS has currently offered."

While JSEP is "based" on existing VoIP technologies, that's as true as saying that dogs are "based on" plants. The JSEP proposal, like the Microsoft proposal, has a strong heritage in existing VoIP standards (STUN, RTP, RTCP), but has made enough changes that interoperability between browsers themselves, much less browsers and legacy VoIP equipment, is now seriously in doubt. The discussion currently in progress on the IETF mailing list regarding adding trickle candidate support to ICE is an example of this.

When a browser generates an SDP offer and then the developer modifies that SDP and calls "setLocalDescription", how will the developer know *what* SDP is permitted by the particular browser context they are running in? Not just "which codecs are supported" but what various permutations of legal SDP from which particular RFCs will and will not be allowed as modifications? And if it doesn't work, how is that signaled? Are we going to pop up an "SDP line 5 syntax error after semicolon" alert for the user?

And that's just for compatibility within a single browser. Then, when we take that SDP to another browser, after modifying it, will *that* be accepted at the far end? And will the changes we need to make to the SDP offer/answer mechanism to enable things like "setLocalDescription" leave this compatible with legacy SIP endpoints or not?

None of this is answered by the current state of the WEBRTC specification.

We have proposed a set of concrete changes which answer these questions more clearly. It is just that, a proposal... we hope that the changes proposed lead to a better WEBRTC for everyone, and know that will be through the synthesis of ideas, not the wholesale substitution of our proposal for the current specification.

---

"After a lot of debate, the WG ultimately rejected both of these and settled on a protocol called JavaScript Session Establishment Protocol (JSEP), which is probably best described as a mid-level API. .... The idea is supposed to be that it's simple to write a basic application (indeed, a large number of such simple demonstration apps have been written) but that it's also possible to exercise advanced features by manipulating the various data structures emitted by the browser. This is obviously something of a compromise between the first two classes of proposals."

We believe that, like many such committee-designed compromises, the current specification has reached the point where the benefits of the hybrid are outweighed by the disadvantages, and that it is time for some rethinking in order to create what will ultimately be a cleaner architecture and specification.

It is unfortunate that we weren't able to put together a stronger proposal at the time of the initial debate within the WG, but that doesn't make the facts of the current situation any different.

---

"Probably the strongest point that the MS authors make is that if the API doesn't explicitly support doing something, the situation is kind of gross...
What this is about is that in JSEP you call CreateOffer() on a PeerConnection in order to get an SDP offer. This doesn't actually change the PeerConnection state to accomodate the new offer; instead, you call SetLocalDescription() to install the offer. This gives the Web application the opportunity to apply its own preferences by editing the offer. For instance, it might delete a line containing a codec that it didn't want to use. Obviously, this requires a lot of knowledge of SDP in the application, which is irritating to say the least, for the reasons in the quote above."

As I said above, it requires a lot of knowledge of not just SDP but of the specific dialect of SDP that is parseable *by this specific browser*. The history of SDP compatibility between vendors is ugly, and it is unlikely that with WEBRTC we would do any better.

---

"Note that it's not like this complexity doesn't exist in JSEP, it's just been pushed into the browser so that the user doesn't have to see it. As discussed below, Microsoft's argument is that this simplicity in the JavaScript comes at a price in terms of flexibility and robustness, and that libraries will be developed (think jQuery) to give the average Web programmer a simple experience, so that they won't have to accept a lot of complexity themselves. However, since those libraries don't exist, it seems kind of unclear how well that's going to work."

This is an accurate summary of our argument... the supposed simplicity in the current specification comes at a significant price in terms of flexibility and robustness and that ultimately leads to interoperability issues.

Not only do we believe that libraries will be developed to simplify work for developers of less complex applications, we believe that it is in fact possible to implement JSEP as a JavaScript library on top of our proposed API. This means that if developers really prefer the many-humped camel that is JSEP, they would still be able to program to that API... but if they want more "architectural cleanliness" they can program directly to what we hope is a much better WEBRTC API as a result of our contributions.

Matthew Kaufman

Received on Monday, 27 August 2012 20:41:04 UTC