Re: [rtcweb] Additional requirement - audio-only communication

From: Justin Uberti <juberti@google.com> · Date: Thu, 25 Aug 2011 18:33:30 -0400

On Thu, Aug 25, 2011 at 1:04 PM, Matthew Kaufman
<matthew.kaufman@skype.net>wrote:

> On 8/25/2011 9:24 AM, Randell Jesup wrote:
>
>> On 8/25/2011 11:55 AM, Matthew Kaufman wrote:
>>
>>  On 8/24/2011 4:36 AM, Harald Alvestrand wrote:
>>>
>>
>> Re: negotiate offer/answer separately in each direction
>>
>>
>>
>>>> I think that:
>>>> a) this doesn't make sense - it's a completely new SDP/RTP practice,
>>>> and we should not depart from established practice without a good
>>>> reason; it also flies against the "keep the number of RTP sessions as
>>>> low as we can" conclusion that came out of all the discussions about
>>>> ICE.
>>>> b) it's not consistent with section 4.1.2 step 7.
>>>>
>>>> I think step 16 of section 4:
>>>> "If connection's ICE started flag is still false, start the
>>>> PeerConnection ICE Agent and send the initial offer. The initial offer
>>>> must include a media description for the PeerConnection data UDP media
>>>> stream, marked as "sendrecv", and for all the streams in localStreams
>>>> (marked as "sendonly")."
>>>>
>>>> is neither correct nor complete.
>>>>
>>>
>>> I agree that "this doesn't make sense" and it is just yet another reason
>>> that I think SDP offer-answer is entirely inappropriate for WEBRTC.
>>>
>>> The web user agent should do what the web site's HTML and cooperating
>>> Javascript tell it to do. It should not be engaged in direct negotiation
>>> with the far end such that the outcome is either indeterminate or even
>>> unexpected, except where direct negotiation is explicitly required to
>>> meet a security requirement (the initial ICE handshake to determine that
>>> it is permitted to send data to that endpoint).
>>>
>>> Note that any perceived gains by doing this negotiation (like "what if
>>> my browser is on a slow connection and only wants to receive audio") are
>>> immediately negated the moment the site changes the SDP enroute to add
>>> "wants HD-resolution video" for you.
>>>
>>
>> Ok, so that's a bad web-app - don't use it.
>>
>>
>> Are you really suggesting "send video to someone who doesn't want it
>> anyways"?
>>
>
> No, I don't think that's a good idea. But one of the arguments I've heard
> *for* doing O-A between the two ends is that it somehow ensures that the
> sender won't send things that the receiver isn't prepared to receive.
>
> I was simply pointing out that this isn't true at all. Because the O-A can
> easily be modified in-transit, either end can be trivially convinced to do
> things it otherwise shouldn't be doing.
>
>
>
>>
>> "perceived gains" - sending me video at (say) 500K or more plus audio
>> at<50K,
>> when I'm on a 128K link will kill my connection until I can somehow get
>> the
>> other side to back off.  Horrid user experience.
>>
>
> Yes. And it will be possible for a "bad web site" to do this whether we use
> O-A or something else.
>
>
>  Remember people without
>> broadband or with limited broadband will be using this.  What if I have
>> 768
>> down, 128 up - but Johnny in the his bedroom is watching youtube videos or
>> downloading torrents, etc?
>>
>
> In theory the sender will back off once they get reports of massive loss,
> assuming we do congestion control for the media. If not, they you're both
> out of luck. No different really than anything else though... I can easily
> build a web site that lets you connect over HTTP and then runs a modified
> TCP that doesn't slow down when there's loss.
>
>
>
>
>>
>>  In addition, because the spec is currently written with offer-answer, a
>>> wide array of use cases that would be possible if capabilities were
>>> instead exposed via Javascript become impossible. As an example, it
>>> should be possible for me as a web site developer to create a page that
>>> can determine, without prompting you to use your camera, whether or not
>>> a camera is available and if so what codecs it supports. That way I can
>>> put "I see you have a high-resolution camera and can encode H.264
>>> video... click here to call a live agent who can help you find the exact
>>> replacement part" for users who have that, and not if they don't have a
>>> codec that works with my call center or a camera with sufficient
>>> resolution to examine the parts I sell.
>>>
>>
>> In what way is that use-case blocked by offer-answer?  Access to the video
>> needs user confirmation; access to capabilities info shouldn't.
>>
>
> I'm reading http://dev.w3.org/2011/webrtc/**editor/webrtc.html<http://dev.w3.org/2011/webrtc/editor/webrtc.html>
>
> I cannot see how to get it to generate an SDP offer before I try to open a
> media connection.
>
> I also cannot see how one could possibly know *what* SDP to offer until you
> call "getUserMedia", which prompts the user. (As an example, I have two
> cameras attached to this computer over USB. One of them has an on-board
> H.264 encoder. The other does not. If my browser can do pass-through of the
> encoded H.264 but doesn't have its own H.264 encoder, what offer do I
> generate before the camera is selected? SDP doesn't have a good way to
> encode "maybe".)
>
> What we need are a set of Javascript APIs that let us enumerate the
> available audio input devices and encoders, video input devices and
> encoders, audio output devices and decoders, video output devices and
> decoders. I hate to say it but even H.245 is probably a better model for how
> to collect the capabilities than SDP O-A.
>
>
>  And so the
>> page can generate an offer with audio and video, or just audio if they
>> have
>> no camera.  A user declining to send video doesn't necessarily mean you
>> don't negotiate it - they advertise receive-only for video (if the web app
>> wants to).
>>
>>
> I don't understand what you mean by "(if the web app wants to)"... the
> current proposed specification doesn't have a way for the web app to control
> whether the SDP offer is receive-only for video or not.
>
>
>
>> You state above there are major possibilities blocked by offer-answer (I
>> can
>> think of some minor issues that with O-A require a second O-A pass) - can
>> you
>> detail some use-cases so we can see the benefit and not just the
>> assertion?
>>
>
> It is certainly possible that a repeated set of fake offer-answer exchanges
> can be used by either Javascript or the server to determine what the
> capabilities are, and that a set of rich Javascript-exposed capabilities may
> be used to generate SDP offers and answers, so in that sense they are
> mappable from one to another.
>
> But I would argue that turning capabilities into SDP is easier than the
> reverse. This is really a question of whether we are trying to turn the
> browser into a *platform for building applications* (in which case we should
> be exposing, as an operating system does, APIs for determining what is
> possible and APIs for control) or turning the browser into *a phone*, in
> which case sure, SIP and SDP are both fine choices.
>

I think it makes sense for the browser to emit capabilities, which could
then be used by the web app to generate a SDP offer or answer. This provides
the web app with more control over the functionality; the browser just
reacts to what the web app asks it to do, rather than trying to make
decisions (such as how the answer should be generated from the offer) that
may not be possible without context.

The original problem that started this email is one specific example - if
the callee application wants to only receive audio, the application can
generate an audio-only SDP based on the offer, the browser capabilities, and
the desired app behavior - without any new APIs in the browser.

>
> I think there's a whole lot of potential applications beyond what we're
> currently thinking of if we provide a platform and not just a phone.
>
> I've already outlined one case (offering the user the ability to place a
> video call *if* there is an acceptable camera and encoder on the system).
>
> Another obvious one is where there's a dozen people already in a video call
> and one more wishes to join... if the server knows what capabilities exist,
> the server can tell the new joiner "your browser can't support video in a
> format that the other users need" or can tell the existing user browsers to
> switch to a different codec that is now compatible with everyone or
> whatever, rather than having to re-run offer-answer with every party.

>
>
>  Then we could balance that against the advantage of O-A being well-speced
>> and
>> known and implemented.
>>
>
> But it *isn't* well-speced for use cases other than one-to-one calling,
> really. It works poorly for recording, it works poorly for large multi-party
> conferences, etc.
>
> And on top of that, it is missing important attributes that we'll want to
> control (like whether the Opus codec is being forced to "music" mode or
> not).
>
>
>  (And the issues surrounding how well and how
>> interoperably web-app developers can implement their own capability
>> negotiations/etc)
>>
>>
>>
> For interoperability, the browser (in Javascript) or the server (in any
> language you wish) can generate SDP, if that's how they wish to
> interoperate.
>
> Turning capability and control APIs into SDP is exactly the same problem as
> turning operating system APIs into SDP... the browser should be the new
> operating system, not just a hardcoded phone.
>
> Matthew Kaufman
>
>
> ______________________________**_________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/**listinfo/rtcweb<https://www.ietf.org/mailman/listinfo/rtcweb>
>