Re: [w3ctag/design-reviews] Web Translation API (Issue #948) from Domenic Denicola on 2024-07-30 (public-webapps-github@w3.org from July 2024)

From: Domenic Denicola <notifications@github.com>
Date: Mon, 29 Jul 2024 20:22:41 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/948/2257382202@github.com>
Thanks for the review!

> 1. It would be good to have a list of use cases. We could think of some from our own experience, but they may be different than the ones you had in mind. Having an explicit list of use cases ensures that everyone is on the same page.

I believe these are listed in the first paragraph of the explainer. https://github.com/WICG/translation-api/blob/main/README.md#explainer-for-the-web-translation-and-language-detection-apis

> 3\. We're concerned about the use of the network. Specifically, use of the network to download a model, or use of the network to actually perform the translation, could introduce both delay and privacy issues. Is it possible for the developer to specify: "only do this if network access is not required"? We feel that differentiation between fast-local, slow-local (i.e. with downlaod), and remote/cloud-based cases is important for MVP.

It is possible for the developer to avoid downloading the model, if the browser intends to support on-device translation, by checking if `capabilities.available` is `"readily"` (as opposed to `"after-download"`).

We haven't yet exposed whether the translation is done entirely on-device or through cloud services, because doing so could possibly cause developers to write code that excludes certain browsers. But, we understand this could be worthwhile. This is mentioned in https://github.com/WICG/translation-api/blob/main/README.md#goals . We'll closely monitor this space, to find out if there are developers who need this ability, and/or whether any browsers actually plan to implement using cloud services.

> 4\. We loved the approach you propose to partitioning, and using a fake download, to mitigate fingerprinting!

Thanks for the kind words, although at least the fake downloads idea isn't looking too promising at the moment. https://github.com/WICG/translation-api/issues/10

> 5\. We recommend a translation-specific namespace instead of `ai`.

This is related to some ongoing work on other AI model-based APIs which are not yet at the stage of being ready for TAG review. We want them all to share a namespace and a set of common API patterns (e.g. sibling `create()` and `capabilities()` methods; `"no"`/`"readily"`/`"after-download"` availability signals; `destroy()` methods; specific `AbortSignal` integration patterns; etc.)

I understand it can be hard to judge this in the absence of other reviewable explainers, so we can revisit this later when we make more progress on those. Stay tuned!!

> 6\. Why is a separate namespace needed at all? We understand these objects are not constructible due to the asynchronicity, but since they are creating instances of the same class, making this obvious by adding the factory as a static method of this class seems more consistent with precedent. Same for the `capabilities()` method, we don't understand why this needs to live in a different namespace, and we think that the more objects this API is spread across, the harder it will be for authors to understand how the different parts fit together.

I thought about this avenue as well.

First, to clarify, we do need a separate `capabilities()` method so that web developers can determine model capabilities without initiating a create operation. (Which can be expensive, both in bandwidth and in GPU memory.) So we cannot merge that into the translator object. And, this method needs to be asynchronous, as the source of truth for the capabilities information will not generally be in the same process as the WindowOrWorkerGlobalScope. (We could proactively load the capabilities information into every WindowOrWorkerGlobalScope, but that would cause all sites, including those not using these APIs, to pay the cost. Which is undesirable.)

So I think what you're suggesting ends up converting the API from something like

```js
const capabilities = await ai.translator.capabilities();
const translator = await ai.translator.create();
```

to something like

```js
const capabilities = await AITranslatorCapabilities.get();
const translator = await AITranslator.create();
```

I think this is a viable direction. A bit uglier in my opinion, but if the goal is to minimize the number of namespaces, then it does work. We can keep it as a possibility, and see which web developers prefer, or if other arguments appear on either side.

> 8\. It seems to make more sense, and help simplify the API and alleviate some privacy concerns if the UA renders the download progress bar.

The exact UI signals for when these APIs are in use is definitely worth exploring. Browser UI teams are not always excited about adding "noise" to what the user sees, but if we end up needing a permission prompt or similar anyway for privacy reasons, maybe we could convince them to add in some progress measures.

> 9\. We did wonder if it would make sense to have a single object for the detection and translation, since they are so related (and often detection is the first step to translation). Was this direction explored?

To some extent yes. Before https://github.com/WICG/translation-api/commit/2cb6637e6584c9b1f43d49309a8a395bd9b927e7 the APIs were more tighly coupled, both existing on a `self.translation` API. We still had separate detector and translator objects, though. This seems necessary, because a translator has a specific source/target language pair associated with it, and a detector does not.

We separated the APIs even more once we looked into the possible implementation strategies. It turns out that language detector models and translation models are generally quite different. And we wanted to allow browsers to take advantage of these differences, instead of forcing them to unify to a lowest-common-denominator, or expose strange inconsistencies to web developers.

For example, you can find small off-the-shelf language detector models supporting over 80 languages. (If I am reading [this MDN page correctly](https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/i18n/detectLanguage) correctly, both Firefox and Chrome use such a model for the Web Extensions `i18n.detectLanguage()` API.) But, for example, [Firefox's language translation models](https://support.mozilla.org/en-US/kb/website-translation) support 10 languages. In our previous design, we had a single `supportedLanguages()` method, which doesn't make sense given such a setup.

A related question is discussed in https://github.com/WICG/translation-api/blob/main/README.md#allowing-unknown-source-languages-for-translation.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/948#issuecomment-2257382202
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/948/2257382202@github.com>
Received on Tuesday, 30 July 2024 03:22:46 UTC