Re: [w3ctag/design-reviews] On-device Web Speech API (Issue #1038) from Jeffrey Yasskin on 2025-04-21 (public-webapps-github@w3.org from April 2025)

From: Jeffrey Yasskin <notifications@github.com>
Date: Mon, 21 Apr 2025 10:18:00 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/1038/2819055394@github.com>

jyasskin left a comment (w3ctag/design-reviews#1038)

Thanks for the reply and initial changes! Here are some initial thoughts from my perspective. These haven't been vetted by the whole TAG yet:

#### 1. Restricting Recognition Location

A cloud server that the UA uses for recognition would be a **second-party** service, since the user is the second party, and the server operates on their behalf. I still support a hint that the site is acting to reduce the number of machines that could see the audio, which will encourage UAs to do the same, but I don't think it can honestly be called "ondevice-only" and still give UAs the flexibility they need to act on their users' behalf. I recognize that Chrome and Firefox don't have any plans to let users offload this work to cloud services, but we should design the API to accommodate future UAs that might want to explore that direction.

I _would_ support a statement that, at least given this hint, UAs MUST NOT expose the audio to any third parties, in case that helps these concerned partners be more comfortable with the change.

#### 2. Recognizing Other Users' Speech

It's true that the current design enables flexibility, but it's not clear that this is user-serving flexibility. It might be, but I don't see use cases described in the [explainer](https://github.com/WebAudio/web-speech-api/blob/main/explainers/on-device-speech-recognition.md) that show that users need the flexibility. "scenarios where personalized or local processing is preferred" should instead describe a few of those scenarios. Note that https://github.com/WebAudio/web-speech-api/pull/150/files specifically bans personalized processing, and local processing is just as possible with sender-side recognition, which reinforces the possibility that there are no such scenarios.

#### 4. Fingerprinting

Note that the [Fingerprinting Mitigations](https://docs.google.com/document/d/1-9m-oe1x34nM2mCTzPsnSZIlSfYQXbZ7_SrvtMD1wuw/edit?usp=sharing) document isn't world-readable, so most of the TAG can't see it.

I generally like the [direction](https://github.com/webmachinelearning/writing-assistance-apis/pull/47) of the Web Translation API’s approach to privacy, but the TAG hasn't fully analyzed it, so I don't want to say it's definitely enough. I think a **list of use cases for the availableOnDevice() query** would help the rest of the TAG get more comfortable with the idea.

Does "support only one language pack per language at a time" mean that each browser major version will be pinned to exactly one pack version per language? That's tighter than https://github.com/webmachinelearning/writing-assistance-apis/pull/47 was willing to require, but it solves the extra fingerprinting problem.

--
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/1038#issuecomment-2819055394
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/1038/2819055394@github.com>

Received on Monday, 21 April 2025 17:18:04 UTC