RE: Call for Consensus (CfC): Face Detection API (PR 78 in MediaCapture-Extensions)

Thanks Bernard for raising the concerns, please have a look -
1.  Scope of applicability.
Face detection API has been available for Android phone vendors since API level 14 (Android 4.0, 2011) and is currently available in almost all Android phones. On ChromeOS, the Android Camera2 API, which supports face detection, has been available on all Pixelbooks from Google (from 2017 onwards). On MacOS it's supported from 10.13+ (2017 timeframe). On Windows 10 and above, clients with driver support have supported a face detection API and I agree, the percentage of Windows clients on the market who can take advantage of this right now is low, but as known, after the pandemic, OEMs have been investing into better camera systems on their client devices.
2.  Variance of results.
We offer another path (hopefully much more performant) which is already available to native developers. Developers are free to use ML frameworks like TF.js in any case on any device. This is a quality of implementation issue which allows innovation to happen in implementation. FaceDetection model architecture is an implementation detail by design, allowing models to be updated (drivers/OS updates) without breaking the API contract. We are not trying to standardize the model architecture. This is similar to how the Shape Detection API [2] works, where FaceDetector represents an underlying accelerated platform's component for detection of human faces in images. This is also the way echo cancellation [3] works, where there's "browser" and "system" implementations. Safari and Chrome browsers use different echo cancellation algorithms but nevertheless that hasn't been an issue for apps such as Zoom and Teams.
On platforms offering unreasonably low-quality face detection service, user agents could not expose the platform service. This is the way WebRTC and WebCodecs encoders[1] are currently implemented in the Chromium browser. For example, if desired, applications can fall back using their own face detection implementation while still using the same metadata structures for storing detection results. However, this might cause loss in power and performance [4].
3.  Generality.
Originally, we proposed an API which would have allowed more complex segmentation with a generic contour. The feedback from the group was that:
(a) the API should be now minimal and be extended later when the need arises [5], and
(b) a more complex segmentation (contour) should be separated from a simple bounding box approach [6] and what drivers can currently offer [7].
This led us to revise the proposal to represent detected faces with only a simple bounding box. Encoder optimization is already possible using the proposed simple API.
If the objection is for turning the major point from face description to descriptions with arbitrary object types including faces and face landmarks, that modification to the API can be done relatively straightforwardly, but it would not be the minimum viable product (MVP).
Let's discuss this point in the issue #79 [8]. However, we would ask whether the issue is a blocker for the CfC and merging or if we could reach consensus now and update the proposal after discussions in the mentioned issue. There has otherwise been a large acceptance of the CfC and interest in the face detection API. The expectation is that in any case the API would evolve after the CfC.
[1] https://www.w3.org/TR/webcodecs/
[2] https://wicg.github.io/shape-detection-api/#face-detection-api
[3] https://developer.mozilla.org/en-US/docs/Web/API/MediaTrackSettings/echoCancellation
[4] https://github.com/ttoivone/mediacapture-extensions/blob/ttoivone-20221111-fdmd-flat/face-detection-explainer.md
[5] Youenn Fablet, https://github.com/w3c/mediacapture-extensions/pull/48#issuecomment-1155087741
[6] Youenn Fablet, Tim Panton, https://www.w3.org/2022/11/15-webrtc-minutes.html#t07
[7] Youenn Fablet, https://www.w3.org/2021/11/24-webrtc-minutes.html#t04
[8] https://github.com/w3c/mediacapture-extensions/issues/79
-Riju.

From: Bernard Aboba <Bernard.Aboba@microsoft.com>
Sent: Tuesday, January 17, 2023 10:04 AM
To: public-webrtc@W3.org
Subject: Re: Call for Consensus (CfC): Face Detection API (PR 78 in MediaCapture-Extensions)


I object to merging the Face Detection API in PR 78 due to the following concerns:

  1.  Scope of applicability. Currently the API in PR 78 proposes to provide hw acceleration for Face Detection based on camera driver support.  Tying support for accelerated Face Detection to support in a camera driver seems unlikely to provide wide coverage, since it is likely to only be supported on new camera models.  This will be frustrating for applications that do not wish to develop their own face detection models.  Those applications that have their own face detection models, will probably choose instead to leverage general acceleration approaches supported with Web ML platforms such as Tensorflow.js, which have wider coverage (e.g. by using WebGL for acceleration).
  2.  Variance of results. Since the actual models used may vary by camera, these APIs can potentially give varying results depending on the hardware, which will impose a support burden on applications, which could need to maintain maintain a camera blacklist, which could be difficult to develop without the ability to identify the camera hardware, which could be considered a fingerprinting risk. This problem will not arise for applications utilizing an existing face detection model written for an ML platform, since those models will yield the same results, albeit with better or worse performance.
  3.  Generality.  As noted in Issue 79<https://github.com/w3c/mediacapture-extensions/issues/79>, VideoFrame metadata is a kind of Segmentation metadata, which has a number of potential uses, such as encoder optimization. I'd therefore prefer to solve the problem more generally, with face detection being one type of segmentation.


===============================================


This is a Call for Consensus (CfC) on the Face Detection API contained in PR 78 of the Mediacapture-Extensions document.



PR 78 is available for inspection here:

Add face detection constraints and VideoFrameMetadata members by ttoivone * Pull Request #78 * w3c/mediacapture-extensions (github.com)<https://github.com/w3c/mediacapture-extensions/pull/78>



A "rich diff" can be found here: Add face detection constraints and VideoFrameMetadata members by ttoivone * Pull Request #78 * w3c/mediacapture-extensions (github.com)<https://github.com/w3c/mediacapture-extensions/pull/78/files?diff=split&w=0&short_path=dc726c5#diff-dc726c5254fb7ca8b63371f61c87fdf33312f7e83a966cb05a2f8ad2c3af54ef>



The GitHub Issues list is here: Issues * w3c/mediacapture-extensions (github.com)<https://github.com/w3c/mediacapture-extensions/issues>



In response, please state one of the following:





  *   I support merging the Face Detection API in PR 78 into the MediaCapture-Extensions document.

  *   I object to merging the Face Detection API in PR 78 due to the following Open Issues <Issue number #s>



The Call for Consensus (Cfc) will last until midnight Pacific Time on January 16, 2023.



Bernard Aboba

For the Chairs

Received on Wednesday, 18 January 2023 14:02:19 UTC