RE: OfferToReceive in CreateAnswer from Matthew Kaufman on 2012-10-26 (public-webrtc@w3.org from October 2012)

From: Matthew Kaufman <matthew.kaufman@skype.net>
Date: Fri, 26 Oct 2012 14:57:42 +0000
To: Harald Alvestrand <harald@alvestrand.no>, "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <AE1A6B5FD507DC4FB3C5166F3A05A484160FAC00@tk5ex14mbxc272.redmond.corp.microsoft.>
> From: Harald Alvestrand [mailto:harald@alvestrand.no]
>
> Wearing my implementor's hat, reporting on a discussion:
> 
> The text in section 11.1

Section 11.1 of what?

> about OfferToReceiveAudio and
> OfferToReceiveVideo talks about offers, but doesn't say what should be
> done with answers. If the language was interpreted exactly as it stands, the
> result of doing a CreateAnswer without adding any streams would be to
> create an SDP that rejected the Audio and Video m-lines - which doesn't
> make much sense.

Why not? If you don't add MediaStreams, there's no way to know what parts of the offer may be accepted. Rejecting those offered streams seems like thing *only* thing that makes sense.

> 
> A reasonable interpretation should be this (all refer to the responding
> RTCPeerConnection):
> 
> - When no OfferToReceive* is present, and no streams are added, all m=
> lines from sender are accepted. (Query: should they be accepted in
> a=recvonly mode?)

This is wrong for so many reasons.

The spec itself says "... Like createOffer, the returned blob contains descriptions of the local MediaStreams attached to this RTCPeerConnection, the codec/RTP/RTCP options negotiated for this session, ..."

So if there's no "local MediaStreams attached to this RTCPeerConnection" then I'm afraid the answer is clear... there's no way to have descriptions of them in the answer, so they can't be described in the answer. Thus there's no way to accept those m= sections and no way to know what you'd attributes you would set anyway.

And the spec itself says "As an answer, the generated SDP will contain a specific configuration that, along with the offer, specifies how the media plane should be established. The generation of the SDP must follow the appropriate process for generating an answer or provisional answer."

Without the MediaStreams attached, there's no way to know "how the media plane should be established"... you don't know what codecs are supported by those streams, etc.

And "The appropriate process for generating an answer or provisional answer" is of course specified (in part) in RFC3264, which contains statements such as "For streams marked as sendrecv in the answer, the "m=" line MUST contain at least one codec the answerer is willing to both send and receive, from amongst those listed in the offer."

So which codec would you select, if any, given that you don't know what MediaStream will be attached?

As for "should they be accepted in a=recvonly mode?", RFC3264 says that would be a reasonable option *if* the SDP offer contains sendonly or sendrecv. But what do you do if the offer is recvonly?

And how would you comply with "Once the answerer has sent the answer, it MUST be prepared to receive media for any recvonly streams described by that answer" if you haven't attached MediaStreams?

> 
> - When OfferToReceive* is nandatory false, and no streams are added, the
> corresponding m= line is rejected (by setting the port number to 0).
>

This at least makes sense.
 
> - I don't think we can add m= lines in an answer, so when the incoming offer
> has no video line, and the OfferToReceiveVideo is true, the answerer fires
> negotiationneeded immediately after returning to the no-outstanding-offer
> state, so that the answerer can offer an upgrade to add video.
> 
> Does this make sense? If so, can it be added?

No, you can't add m= lines in an answer as long as you want to follow RFC3264 ("For each "m=" line in the offer, there MUST be a corresponding "m="  line in the answer.  The answer MUST contain exactly the same number  of "m=" lines as the offer.  This allows for streams to be matched up based on their order.")

Of course we've already apparently decided that the API isn't actually following RFC3264 by enforcing the state machine transitions it calls out, so I guess you could do whatever you want.

But given that it isn't trying to enforce state transitions elsewhere, why should it fire "negotiationneeded" immediately? If the API isn't going to force the appropriate state transitions because they can be implemented by the JavaScript developer who is trying to interoperate with things that actually do SDP offer/answer, why would it fire an event to tell the JavaScript developer something they ought to already know, which is that the incoming offer has no video line and they tried to answer saying they wanted receive video? (And if the offer doesn't say they can do video, what's going to happen when you try the negotiation again, anyway?)

I find the timing of the upcoming W3C meeting appropriate, as Frankenstein's monster is a great analogy for what this underspecified API is becoming.

Matthew Kaufman
Received on Friday, 26 October 2012 14:59:09 UTC