[mediacapture-main] Why does `navigator.mediaDevices.enumerateDevices()` require that `Document` must have active keyboard focus? (#905)

juj has just created a new issue for https://github.com/w3c/mediacapture-main:

== Why does `navigator.mediaDevices.enumerateDevices()` require that `Document` must have active keyboard focus? ==
At Unity we are implementing support for Unity web exported content on mobile devices, and one part of that work is exposing access for Unity projects to utilize the webcam and other audio capture devices.

Recently we have gotten reports about issues, where on Firefox, the Unity page load will not progress on the background, but users must have the Unity WebGL game tab active on the foreground for loading to proceed. This is reported only to occur on Firefox.

Further investigating into the issue, the problem is due to the `navigator.mediaDevices.enumerateDevices()` check, which Unity performs at page loading stage. This check populates initial webcam and microphone information to the Unity C# project code to access, and only after it completes, will the main C# content start.

The reason for implementing a device enumeration gate to the Unity content loading progress is that after the loading has finished, Unity C# code may be initializing 3D scene data based on the existence availability of a webcam or a microphone.

Actually [starting a webcam/microphone access in Unity is an asynchronous operation](https://docs.unity3d.com/ScriptReference/Application.RequestUserAuthorization.html).

However simply [querying the set of available devices is designed to be a synchronous operation](https://docs.unity3d.com/ScriptReference/WebCamTexture-devices.html). A Unity C# project can potentially access the webcam info immediately at project startup, hence why we gate the actual content startup to run an `enumerateDevices()` step.

The web spec states that the `enumerateDevices()` operation does not require a user permissions check, only starting a device, and getting detailed device info does - which does suit our needs.

However, for some odd reason, it has been specced that the Promise returned by `navigator.mediaDevices.enumerateDevices()` should stay lingering until the `Document` object of the calling JS scripting context has acquired **keyboard focus**. See [[1]](https://w3c.github.io/mediacapture-main/#device-enumeration-can-proceed) and [[2]](https://html.spec.whatwg.org/multipage/interaction.html#focus).

What this means is that JS page content which wishes to simply enumerate devices, without necessarily the intent of activating any of them, will not be able to make progress if the page is on the background. As a result, we find ourselves implementing clunky timeout watchdog timers to check if the enumeration has hung and it will be a waste of time to wait for it to resolve.

Though such behavior is not ideal, since realizing that the enumeration will likely "never" come will take some time as well, and precious startup loading time will have been wasted.

May I inquire as to what was the rationale in requiring the `Document` to have keyboard focus until device enumeration Promise is allowed to resolve? For what it's worth, it does seem like Firefox, Chrome and Safari are implementing this check differently, and only Firefox does actually require for that to be true. (see [[3]](https://bugzilla.mozilla.org/show_bug.cgi?id=1732410) and [[4]](https://bugzilla.mozilla.org/show_bug.cgi?id=1397978))

Would it be possible to actually consider removing that requirement? To my understanding that requirement is not serving a security related benefit, since the information that is returned is already non-identifying? (only after acquiring a permission for a device using the Permissions API one will get detailed HW info). Or am I misguided here?

Or if removing the requirement is not at all possible to even consider, would it be possible to be able to perform an enumeration query that would be able to immediately **reject** the Promise if "now is not the time to allow doing this type of query", so that these types of watchdog timers would not be needed, and JS page load would be able to proceed quickly, without needing to resort to implementing a clumsy watchdog timer?

That way JS pages would not be left hanging, and they could decide to do something else with the precious loading time.

Thanks for considering!

Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/905 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 3 October 2022 15:08:24 UTC