Re: Screensharing: Bootstrapping Collaboration between Capturer and Capturee

Thinking a bit outside the box: there's an existing weary to communicate
"state" to a web page and that's through URL fragments and URL query
parameters. Could we just support those instead of something as specific as
slide numbers?


On Thu, Jun 17, 2021, 12:27 PM Jan-Ivar Bruaroey <> wrote:

> > Embedding the specific UA controls in the browser (proposal A and B) is
> a layering violation.
> @Harald (I think you mean B?) In any case, the explainer
> <>
> seems quite clear that the primary use case of getDisplayMedia is
> presentations in video conferences. That's why browsers push captured
> windows to the front
> <>, among
> other things. I'm not taking a position on Youenn's proposal, only noting
> that having user agents involved at a high level here does not seem out of
> bounds: Two sites are involved, and user agents stepping in to mediate to
> protect users seems fine and appropriate to me, if they wish to do that.
> > Considering both exist in the same browser instance, that is a
> mind-bogglingly unnecessary detour.
> @Lennart Point taken. Most users run just one browser (we're all the
> anomaly). So it's useful to scope the problem down to capturer and capturee
> in the *same user agent*.
> This may not matter in the id-solution, but back to Sergio's concern:
> > My concern is that it will provide no value to Jitsi but will help
> create monopolies to VC+content providers.
> @Elad An id-only solution is no good, for this reason. At the same, an
> id-solution is probably inevitable: even a {slideNumber} constraint could
> be used to communicate a handshake.
> Thus, the solution needs to mandate a baseline set of presentation
> controls that work without any id, and expose the id as part of registering
> for that. That's what I'd like to propose:
>    1. Target calls a method, providing how many slides it has, and
>    whether to UA-guard against capture past (cross-origin) navigation
>    2. Capturer gets {min, max} = track.getCapabilities().slideNumber; and
>    can track.applyConstraints({slideNumber: min});
>    3. Capturer can also get a track.getSettings().presentationId.
>    4. Some event if the target navigates or calls the method again.
> I agree we cannot force apps to work together, but if we make it
> embarrassingly easy, then hopefully twitter shaming can do the rest.
> But seriously, I think vc apps see the market benefit of letting
> presenters present with whatever tools they have, just like they recognize
> the benefit of participants joining from browsers on whatever device
> they're on (or dialing in from phones! sheesh)
> I think presentation apps see the market benefit of working in whatever
> meeting their user is in.
> IOW you can't capture an audience that is already captured in a meeting
> they didn't start.
> If it turns out I was being naive here, it'll be on the internet forever.
> No biggie. ;)
> Disclaimer: this all still needs to be under site-isolation and capture
> opt-in.
> On Wed, Jun 16, 2021 at 3:42 PM Harald Alvestrand <>
> wrote:
>> Setting up cross-network communication backs us into the "URI that can be
>> used to connect webrtc" topic.
>> We had strong warnings against minting those tokens in the security docs
>> for general A/V media session usage; it seems to make much more sense (or
>> rather: forbidding it seems to make much less sense) when we're dealing
>> with process-to-process comms in a distributed application.
>> But I think it is a different application space than where Capture-Handle
>> is trying to go.
>> On 6/16/21 12:30 PM, T H Panton wrote:
>> Ha, I see the problem now, I’ve got comfortable in a flat-unique
>> token-namespace and forgotten how ugly the outside world is!
>> I’m afraid this tells me that the idea of just having a token doesn’t
>> work - we need to define a namespace to assign meaning to it.
>> I have a couple of possibles in mind. The simplest being MDNS - you look
>> up the token in MDNS and retrieve an SRV which tells you how to connect.
>> Adding an origin only helps if you have a very specific solution in mind.
>> i.e. server mediated remote control of a well known web app from the same
>> vendor that happens to be on different origins.
>> Which does not allow you to control a local native app (e.g. keynote)
>> which seems to me very necessary. It also doesn’t allow a native capture
>> app to control a web page (e.g. slideshare), which is also desirable.
>> Adding an origin also provides a hook for uncompetitive practice.
>> If we are going to limit communication to ’same browser instance’ then we
>> would be better off with a port we can invoke postMessage() on.
>> Generally I’m in favour of “as simple as possible” - however Einstein did
>> add “but no simpler.”
>> It seems to me the token/origin solution is too simple - it solves small
>> part of the problem and leaves a slightly smaller new problem.
>> Tim.
>> On 15 Jun 2021, at 22:23, Elad Alon <> wrote:
>> Consider the alternative.
>> Different apps have different ID spaces. 0x1234567890abcdef might be a
>> valid ID on multiple services. Sometimes even for the same overarching
>> company. When a captured app claims to be 0x1234567890abcdef, is that a
>> Vimeo video, a Google Doc, a Google Slides deck, a Microsoft Word session,
>> a CodePen...? And the list goes on.
>> What's left for the app to do? I see two mutually-*non*-exclusive
>> options:
>> 1. Prefix the session ID with an identifier. E.g.
>> MsWord:0x1234567890abcdef. This is vulnerable to either spoofing or
>> unintended clashes, though. "Works sometimes." Better marry that with #2.
>> 2. Verify 0x1234567890abcdef on some shared cloud infrastructure.
>> "0x1234567890abcdef, you say? Let's have some remote challenge to see if
>> you're who you really claim you are." This will take at least an RTT,
>> though...
>> You escape this conundrum if you allow UA-mediated origin-exposure. (
>> *Optional* origin exposure, btw, which means that you can send out an
>> opaque token if you need to.)
>> On Tue, Jun 15, 2021 at 9:49 PM Tim Panton <> wrote:
>>> > On 15 Jun 2021, at 20:37, Harald Alvestrand <>
>>> wrote:
>>> >
>>> > The point of the origin is that the UA vouches for it's authenticity.
>>> >
>>> > The token is just a string. As long as you can pass a string, the app
>>> can choose to pass anything: numbers, tokens, or jsonified objects. App's
>>> business, not UA business.
>>> I still don’t understand why the origin is relevant - apart from
>>> enabling a page to engage in uncompetitive behaviour.
>>> The token is only exchanged when a user has authorised a capture and
>>> both sides agree that the captured page is suitable for remote control.
>>> So why would either side care what the origin is?
>>> T.
> --
> .: Jan-Ivar :.

Received on Friday, 18 June 2021 23:23:42 UTC