Re: Screensharing: Bootstrapping Collaboration between Capturer and Capturee from Jan-Ivar Bruaroey on 2021-06-17 (public-webrtc@w3.org from June 2021)

From: Jan-Ivar Bruaroey <jib@mozilla.com>
Date: Wed, 16 Jun 2021 22:22:11 -0400
To: Harald Alvestrand <harald@alvestrand.no>
Cc: T H Panton <tim@pi.pe>, Elad Alon <eladalon@google.com>, Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>, Youenn Fablet <youenn@apple.com>, WebRTC WG <public-webrtc@w3.org>
Message-ID: <CABr+gEikGCoBU-PX4P4n8WOMSdt9R4CboGrMKBWt0d44qV-qzw@mail.gmail.com>
> Embedding the specific UA controls in the browser (proposal A and B) is a
layering violation.

@Harald (I think you mean B?) In any case, the explainer
<https://github.com/w3c/mediacapture-screen-share/blob/gh-pages/explainer.md#what-is-it-whats-it-for>
seems quite clear that the primary use case of getDisplayMedia is
presentations in video conferences. That's why browsers push captured
windows to the front
<https://github.com/w3c/mediacapture-screen-share/issues/138>, among other
things. I'm not taking a position on Youenn's proposal, only noting that
having user agents involved at a high level here does not seem out of
bounds: Two sites are involved, and user agents stepping in to mediate to
protect users seems fine and appropriate to me, if they wish to do that.

> Considering both exist in the same browser instance, that is a
mind-bogglingly unnecessary detour.

@Lennart Point taken. Most users run just one browser (we're all the
anomaly). So it's useful to scope the problem down to capturer and capturee
in the *same user agent*.

This may not matter in the id-solution, but back to Sergio's concern:

> My concern is that it will provide no value to Jitsi but will help create
monopolies to VC+content providers.

@Elad An id-only solution is no good, for this reason. At the same, an
id-solution is probably inevitable: even a {slideNumber} constraint could
be used to communicate a handshake.

Thus, the solution needs to mandate a baseline set of presentation controls
that work without any id, and expose the id as part of registering for
that. That's what I'd like to propose:

   1. Target calls a method, providing how many slides it has, and whether
   to UA-guard against capture past (cross-origin) navigation
   2. Capturer gets {min, max} = track.getCapabilities().slideNumber; and
   can track.applyConstraints({slideNumber: min});
   3. Capturer can also get a track.getSettings().presentationId.
   4. Some event if the target navigates or calls the method again.

I agree we cannot force apps to work together, but if we make it
embarrassingly easy, then hopefully twitter shaming can do the rest.

But seriously, I think vc apps see the market benefit of letting presenters
present with whatever tools they have, just like they recognize the benefit
of participants joining from browsers on whatever device they're on (or
dialing in from phones! sheesh)

I think presentation apps see the market benefit of working in whatever
meeting their user is in.

IOW you can't capture an audience that is already captured in a meeting
they didn't start.

If it turns out I was being naive here, it'll be on the internet forever.
No biggie. ;)

Disclaimer: this all still needs to be under site-isolation and capture
opt-in.

On Wed, Jun 16, 2021 at 3:42 PM Harald Alvestrand <harald@alvestrand.no>
wrote:

> Setting up cross-network communication backs us into the "URI that can be
> used to connect webrtc" topic.
>
> We had strong warnings against minting those tokens in the security docs
> for general A/V media session usage; it seems to make much more sense (or
> rather: forbidding it seems to make much less sense) when we're dealing
> with process-to-process comms in a distributed application.
>
> But I think it is a different application space than where Capture-Handle
> is trying to go.
>
>
> On 6/16/21 12:30 PM, T H Panton wrote:
>
> Ha, I see the problem now, I’ve got comfortable in a flat-unique
> token-namespace and forgotten how ugly the outside world is!
>
> I’m afraid this tells me that the idea of just having a token doesn’t work
> - we need to define a namespace to assign meaning to it.
> I have a couple of possibles in mind. The simplest being MDNS - you look
> up the token in MDNS and retrieve an SRV which tells you how to connect.
>
> Adding an origin only helps if you have a very specific solution in mind.
> i.e. server mediated remote control of a well known web app from the same
> vendor that happens to be on different origins.
> Which does not allow you to control a local native app (e.g. keynote)
> which seems to me very necessary. It also doesn’t allow a native capture
> app to control a web page (e.g. slideshare), which is also desirable.
> Adding an origin also provides a hook for uncompetitive practice.
>
> If we are going to limit communication to ’same browser instance’ then we
> would be better off with a port we can invoke postMessage() on.
>
> Generally I’m in favour of “as simple as possible” - however Einstein did
> add “but no simpler.”
> It seems to me the token/origin solution is too simple - it solves small
> part of the problem and leaves a slightly smaller new problem.
>
> Tim.
>
>
> On 15 Jun 2021, at 22:23, Elad Alon <eladalon@google.com> wrote:
>
> Consider the alternative.
>
> Different apps have different ID spaces. 0x1234567890abcdef might be a
> valid ID on multiple services. Sometimes even for the same overarching
> company. When a captured app claims to be 0x1234567890abcdef, is that a
> Vimeo video, a Google Doc, a Google Slides deck, a Microsoft Word session,
> a CodePen...? And the list goes on.
>
> What's left for the app to do? I see two mutually-*non*-exclusive options:
> 1. Prefix the session ID with an identifier. E.g.
> MsWord:0x1234567890abcdef. This is vulnerable to either spoofing or
> unintended clashes, though. "Works sometimes." Better marry that with #2.
> 2. Verify 0x1234567890abcdef on some shared cloud infrastructure.
> "0x1234567890abcdef, you say? Let's have some remote challenge to see if
> you're who you really claim you are." This will take at least an RTT,
> though...
>
> You escape this conundrum if you allow UA-mediated origin-exposure. (
> *Optional* origin exposure, btw, which means that you can send out an
> opaque token if you need to.)
>
> On Tue, Jun 15, 2021 at 9:49 PM Tim Panton <tim@pi.pe> wrote:
>
>>
>>
>> > On 15 Jun 2021, at 20:37, Harald Alvestrand <harald@alvestrand.no>
>> wrote:
>> >
>> > The point of the origin is that the UA vouches for it's authenticity.
>> >
>> > The token is just a string. As long as you can pass a string, the app
>> can choose to pass anything: numbers, tokens, or jsonified objects. App's
>> business, not UA business.
>>
>>
>> I still don’t understand why the origin is relevant - apart from enabling
>> a page to engage in uncompetitive behaviour.
>>
>> The token is only exchanged when a user has authorised a capture and both
>> sides agree that the captured page is suitable for remote control.
>> So why would either side care what the origin is?
>>
>> T.
>
>
>

-- 
.: Jan-Ivar :.
Received on Thursday, 17 June 2021 02:23:02 UTC