Re: [w3ctag/design-reviews] On-device Web Speech API (Issue #1038) from Evan Liu on 2025-04-25 (public-webapps-github@w3.org from April 2025)

From: Evan Liu <notifications@github.com>
Date: Thu, 24 Apr 2025 17:44:02 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/1038/2829137636@github.com>

evanbliu left a comment (w3ctag/design-reviews#1038)

Thanks for the notes, Jeffrey! I haven't had a chance to meet with the Audio WG yet, but here's my attempt at clarifying some of your questions.

### 1. Restricting Recognition Location
The `recognition.mode = "ondevice-preferred"` option is indeed intended to serve as the "hint" that websites can use to express a preference for on-device speech recognition, without the UA guaranteeing it. This allows UAs flexibility, for instance, to use their own second-party cloud services if it benefits the user (e.g., on low-power devices) and the site has only indicated a preference.

However, we maintain the critical need for a mechanism that allows websites to guarantee that audio is not sent to any external service—be it a second-party or third-party service—for processing. The `recognition.mode = "ondevice-only"` option is designed to serve this specific requirement. If a UA cannot process the audio solely on the device (without any external network transmission of the speech data for recognition purposes), it should then throw an error when this mode is requested. This provides a clear assurance for applications with strict data residency or privacy requirements.

### 2. Recognizing Other Users' Speech
The mention of `MediaStreamTrack` support was in response to the discussion on efficiency and sustainable speech recognition. It enables developers to implement either sender-side or receiver-side captioning. The previous note on personalization was incorrect and has been removed—while local speech recognition can support personalized sender-side captioning when audio is captured via microphone, it does not apply to receiver-side captioning using a `MediaStreamTrack`.

### 4. Fingerprinting
Apologies for the inaccessibility for the document--we're unable to share the exact contents of the document at this time, but the [PR you linked to](https://github.com/webmachinelearning/writing-assistance-apis/pull/47) provides a thorough explanation of the privacy preserving countermeasures of the Web Translation API that we're planning on adopting here.

Here are some use cases that `availableOnDevice()` aims to address:
* Conditionally Offer Features: Decide whether to present UI elements or enable features that rely on on-device recognition before prompting for installation. This avoids offering a feature that isn't viable.
* Graceful Degradation/Enhancement: Allow the site to immediately understand if on-device is an option. If not, it can fall back to alternative mechanisms (like cloud-based services it operates, or informing the user about limitations).
* Resource-Informed UI: For example, a web application might choose to display a "Transcribe Locally" button only if availableOnDevice() indicates potential support.

--
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2829137636
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/1038/2829137636@github.com>

Received on Friday, 25 April 2025 00:44:06 UTC