Re: [mediacapture-handle] [Identity][Enhancement] Expose contentHint (#35)

[Reordered some of the responses in the interest of readability; the first one hopefully makes it clear why.]

> From the issue's description, I am not sure what exactly you are trying to solve, can you clarify this?

Yes, I would love to clarify:
* Networks are imperfect, so encoded video has to make sacrifices.
* Sometimes it is better to sacrifice frame-rate; sometimes resolution. It depends on the encoded content.
* Capturing applications can make better decisions if they know what type of content they're capturing.
* By amending Capture Handle with the proposal in the current issue, a captured application can help the capturing application make better decisions.
* The user agent could sometimes help, but not always, because auto-detection of the captured content is imperfect; see below.

> IIRC, this was discussed at an interim and there were feedback questioning the actual usefulness
> I do not remember the conclusion of this discussion though.

I don't remember anyone proving that this is not useful.
* We have multiple teams inside of Google who are interested in using this.
* Microsoft has expressed interest in this. (See @aboba's [comment](https://github.com/w3c/mediacapture-handle/issues/35#issuecomment-1075836912).)

If someone thinks this is NOT useful, the onus is on them to prove as much.

I can tell you that internally inside Google, some have questioned why auto-detection could not be used instead. My answer is that auto-detection is imperfect and can misfire (more below). The correct algorithm for a capturer-encoder should be:
1. If a suggestedContentHint has been set, use it. (It's possible to disregard if untrusted, but I'd just use it myself; malicious apps would just self-sabotage.)
2. No suggestedContentHint has been set, so let the UA use auto-detection.

> User Agent is smart enough to optimise things

Optimizations can misfire. Consider:
1. Mixed content - text and video. Can the UA decide which is more important?
2. Transitions - can the UA guess what content is coming next? Can it understand that [this frame](https://youtu.be/Z7-QdoofMq8?t=231) will soon be replaced by more video?

> Also, I am wondering whether this API shape is future proof.
> For instance, you might require different content hints if starting to crop capture.

I aim to make incremental progress. If you can propose a larger increment, I am happy to adopt it. Barring that, let's proceed with the best we can think of.

> Given the main goal of capture handle is to allow the creation of a server-based communication channel between capturer and capturee

[Citation needed.](https://en.wikipedia.org/wiki/Citation_needed)

> it seems best to simply use this channel to convey that information.

1. Why incur the delay?
2. Why force tight-coupling between capturer and capturee? With my proposal, Meet/Teams/Jitsi can all work equally well with Docs/Office/Wikipedia. Is that not a Good Thing TM?



-- 
GitHub Notification of comment by eladalon1983
Please view or discuss this issue at https://github.com/w3c/mediacapture-handle/issues/35#issuecomment-1230790287 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 29 August 2022 19:50:22 UTC