Re: Screensharing: Bootstrapping Collaboration between Capturer and Capturee from Elad Alon on 2021-06-10 (public-webrtc@w3.org from June 2021)

From: Elad Alon <eladalon@google.com>
Date: Thu, 10 Jun 2021 20:25:21 +0200
To: Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
Cc: Youenn Fablet <youenn@apple.com>, WebRTC WG <public-webrtc@w3.org>
Message-ID: <CAMO6jDMsoW_zdj3Qks-U44-Ju0G8gzTpPCLrLeioARF5DMihhg@mail.gmail.com>
Hi Sergio,

*Concerning Capture Handle:*
The space of web-based VC applications is constantly growing, as is that of
Web-based productivity suites. What's preventing Jitsi from using this
feature to set up collaborations with applications like Google Docs, MS
Office, or any other productivity suite? What's preventing Jitsi from
producing their own productivity suite and integrating with it using
Capture Handle?

No feature is equally useful to all players on the Web. Screen-capture is
paramount to VC applications, but of little use to airline booking systems.
Even if Capture Handle is more easily employed by entities who sit on both
sides of the seesaw (capturer+capturee), that is not a valid argument
against it. (a) The feature is still very useful for those who only sit on
one side, so long as they collaborate with the other side. (b) The
use-cases for those who sit on both sides are legitimate enough a need for
us to support them.

*Concerning the Actions
<https://w3c.github.io/mediasession/#actions-model>-like alternative:*
I think that's an interesting approach that could be explored in tandem. It
provides a complementary set of capabilities for use-cases that benefit
from the rails it offers, while other applications need the freedom of
Capture Handle.

   1. The Actions-like approach limits collaboration to a closed set of
   predefined actions. The Capture-Handle-based approach bootstraps arbitrary
   APIs.
   2. The Actions-like approach would not easily support non-trivial
   authentication. What if the captured application has privileged actions to
   expose? Would it expose it to any capturer? Or any capturer from a given
   origin? Many applications will need more fine-grained access-control
   (specific account, not just specific domain). Think remotely editing
   content on the captured document. With Capture Handle, apps are free to use
   any authentication method, as we only bootstrap their communication, we
   don't serve as their telco.
   3. The Actions-based approach does not help detect self-capture and its
   awful result - the hall-of-mirrors. Capture Handle does.
   4. The Actions-based approach does not support discovery of captured
   origin, which is useful for analytics - VC applications want to know what
   their users share, so as to approach the right partners for collaboration
   using Capture Handle. That is a legitimate use-case even if the analytics
   gathered are incomplete due to lack of opt-in from some sites.

To conclude this email (*not* the conversation 😜):
I hope to convince you that Capture Handle is very useful to Jitsi. If it's
a bit more useful to X than to Jitsi, I don't think that's a valid
complaint.
I would be happy to collaborate in the future on complementary solutions
that offer something similar, but with sufficient added value ("rails");
the Actions-based approach you and Youenn have brought up sounds like that
to me.

Thanks,
Elad

On Thu, Jun 10, 2021 at 3:08 PM Sergio Garcia Murillo <
sergio.garcia.murillo@gmail.com> wrote:

> Hi Youenn,
>
> I was thinking exactly on that for defining new "presentation" actions,
> although not sure if it would narrow the use cases too much.
>
> Another interesting use case could be about letting the capturee provide
> the best contentHint for the media stream as the capturer won't be able to
> have much info about the contents.
>
> The navigation information is more sensitive, and I don't see a way to
> make it usable without allowing specific domains.
>
> Best regards
> Serigo
>
> El jue, 10 jun 2021 a las 14:16, Youenn Fablet (<youenn@apple.com>)
> escribió:
>
>> Interesting idea Sergio.
>> I wonder whether https://w3c.github.io/mediasession/#actions-model could
>> be a source of inspiration here.
>> Already defined actions include actions like toggling camera/microphone
>> or play/pause. They are triggered by user interacting with UA UI and web
>> application registering to those actions.
>>
>> I wonder whether next slide/previous slide could be defined as actions.
>> Capturee would register to those actions.
>> Capturer would somehow trigger those actions based on a bootstrap
>> mechanism tied to getDisplayMedia.
>> Or UA specific UI would allow trigger them in case UA is smart enough to
>> understand what is happening.
>>
>> On 10 Jun 2021, at 09:11, Sergio Garcia Murillo <
>> sergio.garcia.murillo@gmail.com> wrote:
>>
>> Hi Elad,
>>
>> I find this API really interesting and I can understand the value for
>> google and other service providers. However, it is unclear what is the
>> benefit for the rest of the community. Let me explain my concerns.
>>
>> Given that the method are opt-in, I foresee that only the web sites
>> interested in being captured will ever use this API, and given that the the
>> web sites can set the domains that will be allowed to receive that
>> information, it is not unreasonable to think that they will only allow for
>> the same company VC products.
>>
>> So my worries are that we will end up having an API that will be only
>> enabled in google docs to be able to expose information to google meet, and
>> in microsoft 360 to expose information to microsoft teams, and they will be
>> able to provide a much better presentation experience than the rest of VC
>> services. I am not saying that these are Google or Microsoft intentions,
>> but that is a more than feasible possibility.
>>
>> I understand the value of an API like that, but I think it should be a
>> benefit for all, not just for the ones that control both the content and
>> the conferencing services. I really hope that the API can be modified so
>> this can happen.
>>
>> Best regards
>> Sergio
>>
>>
>> El mar, 8 jun 2021 a las 16:23, Elad Alon (<eladalon@google.com>)
>> escribió:
>>
>>> Hello all,
>>>
>>> An existing issue with screensharing is that the capturing app cannot
>>> easily discover which application is being captured, even if the captured
>>> application wishes to expose this information. Are you tab-sharing a
>>> Wikipedia page or a presentation? If a presentation - what is its session
>>> ID? The capturing application does not know. And what a shame that is! For
>>> if the capturing application knew what it was capturing, it could establish
>>> out-of-band communication with the captured application and request the
>>> next slide, next article or anything, really, without forcing the user to
>>> switch tabs back and forth.
>>>
>>> *Capture Handle* is a feature that solves that problem. I've started Discourse
>>> thread
>>> <https://discourse.wicg.io/t/proposal-capture-handle-bootstrap-app-collaboration-when-screensharing/5354/> on
>>> the WICG. There's also an explainer
>>> <https://docs.google.com/document/d/1oSDmBPYVlxFJxb7ZB_rV6yaAaYIBFDphbkx5bXLnzFg/edit?usp=sharing> and
>>> a spec-draft <https://eladalon1983.github.io/capture-handle/>.
>>>
>>> This feature is available for *Origin Trial in Chrome beginning with
>>> m92*.
>>>
>>> Please send me feedback in whichever way you find most convenient.
>>>
>>> Thanks,
>>> Elad Alon
>>>
>>
>>
Received on Thursday, 10 June 2021 18:27:01 UTC