[mediacapture-image] Face Detection. (#289) from Rijubrata Bhaumik via GitHub on 2021-09-27 (public-webrtc@w3.org from September 2021)

From: Rijubrata Bhaumik via GitHub <sysbot+gh@w3.org>
Date: Mon, 27 Sep 2021 14:03:43 +0000
To: public-webrtc@w3.org
Message-ID: <issues.opened-1008193292-1632751421-sysbot+gh@w3.org>
riju has just created a new issue for https://github.com/w3c/mediacapture-image:

== Face Detection. ==
### Why ?

Face Detection on Video Conferencing.
Support WebRTC-NV use cases like Funny Hats, etc
On client side, developers have to use Computer Vision libraries ([OpenCV.js](https://github.com/riju/WebCamera/tree/master/samples/faceDetection) / [TensorFlow.js](https://www.tensorflow.org/js)) either with a WASM (SIMD+Threads) or a GPU backend for acceptable performance. Many developers would resort to cloud based solutions like [Face API from Azure Cognitive Services](https://azure.microsoft.com/en-us/services/cognitive-services/face/) or [Face Detection from Google Clouds's Vision API](https://cloud.google.com/vision/docs/detecting-faces). On modern client platforms, we can save a lot of data movement and even on-device computation by leveraging the work the camera stack /  Image Processing Unit (IPU) anyways does to improve image quality, for free.

### What ?

**_Prior Work_**
WICG has proposed the [Shape detection API](https://wicg.github.io/shape-detection-api/#face-detection-api) which enables Web applications to use a system-provided face detector, but the API requires that the image data be provided by the Web application itself. To use the API, the application would first need to capture frames from a camera and then give the data to the [Shape detection API](https://wicg.github.io/shape-detection-api/#face-detection-api). This may not only cause extraneous computation and copies of the frame data, but may outright prevent using the camera-dedicated hardware or system libraries for face detection. Often the camera stack performs face detection in any case to improve image quality (like 3A algorithms) and the face detection results could be made available to applications without extra computation.

Many platforms offer a camera API which can perform face detection directly on image frames from the system camera. The face detection can be assisted by the hardware which may not allow applying the functionality to user-provided image data or the API may prevent that.

_**Platform Support**_
| OS               | API              | FaceDetection|
| ------------- |:-------------:| :-----:|
| Windows      | Media Foundation|   [KSPROPERTY_CAMERACONTROL_EXTENDED_FACEDETECTION ](https://docs.microsoft.com/en-us/windows-hardware/drivers/stream/ksproperty-cameracontrol-extended-facedetection?redirectedfrom=MSDN)|
| ChromeOS/Android      | Camera HAL3 | [STATISTICS_FACE_DETECT_MODE_FULL  ](https://developer.android.com/reference/android/hardware/camera2/CameraMetadata#STATISTICS_FACE_DETECT_MODE_FULL)[STATISTICS_FACE_DETECT_MODE_SIMPLE ](https://developer.android.com/reference/android/hardware/camera2/CameraMetadata#STATISTICS_FACE_DETECT_MODE_SIMPLE)|
| Linux | GStreamer      |    [facedetect ](https://gstreamer.freedesktop.org/data/doc/gstreamer/head/gst-plugins-bad/html/gst-plugins-bad-plugins-facedetect.html)|
| macOS| Core Image Vision|    [CIDetectorTypeFace ](https://developer.apple.com/documentation/coreimage)[VNDetectFaceRectanglesRequest](https://developer.apple.com/documentation/vision/vndetectfacerectanglesrequest)|

_**ChromeOS + Android**_
Chrome OS and Android provide the [Camera HAL3 API](https://source.android.com/devices/camera/camera3) for any camera user. The API specifies a method to transfer various image-related metadata to applications. One metadata type contains information on detected faces. The API allows selecting the face detection mode with  
|`STATISTICS_FACE_DETECT_MODE `|  Returns |
|-------|:-----:|
|`STATISTICS_FACE_DETECT_MODE_FULL`  | face rectangles, scores, and landmarks including eye positions and mouth position.  |
|`STATISTICS_FACE_DETECT_MODE_SIMPLE` | only face rectangles and confidence values.|

In Android, the resulting face statistics is parsed and stored into class [Face](https://developer.android.com/reference/android/hardware/camera2/params/Face).

**_Windows_**
Face detection is performed in DeviceMFT on the preview frame buffers. The DeviceMFT integrates the face detection library, and turns on features, when requested by application. Face detection is enabled with property ID [KSPROPERTY_CAMERACONTROL_EXTENDED_FACEDETECTION](https://docs.microsoft.com/en-us/windows-hardware/drivers/stream/ksproperty-cameracontrol-extended-facedetection). When enabled, the face detection results are returned using metadata attribute [MF_CAPTURE_METADATA_FACEROIS](https://docs.microsoft.com/en-us/windows/win32/api/mfapi/ns-mfapi-facerectinfo) which contains, for each face, the face coordinates:
```
typedef struct tagFaceRectInfo {
  RECT Region;
  LONG confidenceLevel;
} FaceRectInfo;
```
The API also supports blink and smile detection which can be enabled with property IDs `KSCAMERA_EXTENDEDPROP_FACEDETECTION_BLINK` and `KSCAMERA_EXTENDEDPROP_FACEDETECTION_SMILE`. 

_**macOS**_
Apple offers face detection using [Core Image CIDetectorTypeFace](https://developer.apple.com/documentation/coreimage/cidetectortypeface) or Vision [VNDetectFaceRectanglesRequest](https://developer.apple.com/documentation/vision/vndetectfacerectanglesrequest).

### How ?

**_Strawman proposal_**

```js
<script>
// Check if face detection is supported by the browser.
const supports = navigator.mediaDevices.getSupportedConstraints();
if (supports.faceDetection) {
    // Browser supports camera face detection.
} else {
    throw('Face detection is not supported');
}

// Open camera with face detection enabled and show to user.
const stream = await navigator.mediaDevices.getUserMedia({
    video: { faceDetection: true }
});
const video = document.querySelector("video");
video.srcObject = stream;

// Get face detection results for the latest frame
videoTracks = stream.getVideoTracks();
videoTrack = videoTracks[0];
const settings = videoTrack.getSettings();
if (settings.faceDetection) {
    const detectedFaces = settings.detectedFaces;
    for (const face of detectedFaces) {
        console.log(
         ` Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),` +
         ` size ${face.boundingBox.width}x${face.boundingBox.height}`);
    }
}
</script>
```



Please view or discuss this issue at https://github.com/w3c/mediacapture-image/issues/289 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config
Received on Monday, 27 September 2021 14:03:45 UTC