Re: [w3ctag/design-reviews] Prompt API (Issue #1093) from Marcos Cáceres on 2025-11-11 (public-webapps-github@w3.org from November 2025)

From: Marcos Cáceres <notifications@github.com>
Date: Mon, 10 Nov 2025 21:35:22 -0800
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/1093/3515070512@github.com>

marcoscaceres left a comment (w3ctag/design-reviews#1093)

@reillyeon Thank you for your proposal. The TAG believes it's important to experiment with ways that enable web applications to integrate with generative models. Regarding the Prompt API, we were unable to reach consensus within the TAG membership on a Resolution, though we encourage further exploration.

While TAG members acknowledge the potential use cases unlocked by the proposal, they’ve also identified several concerns regarding its architecture and its implications for interoperability, security, privacy, and long-term sustainability of the web platform. We are listing them below. Several of the following concerns also apply to the Writing Assistance APIs and the Proofreader API, which share some of the underlying assumptions.

## **Response quality of local models**

We have a concern that on-device models might not meet user, developer, and result expectations, and could be severely limited on low-end devices — exacerbating the digital divide. This problem could get worse over time, as users and developers become increasingly accustomed to crafting extremely complex prompts.

We are also worried that quality and model size may vary significantly across user agents, which could negatively impact people using less common UAs.

We acknowledge that the response quality offered by on-device language models may suffice depending on the use case, that on-device models continue to improve, and user hardware is becoming increasingly capable for AI workloads. It will be important to keep checking for problems and fixing them across the life of this experiment.

## **Model selection and availability**

The response behavior is highly dependent on the model. Developers may tailor their prompts to a specific model, which may yield different results on another one, potentially leading to interop issues and a poor user experience.

We’re skeptical of the potential goal to reveal “some identifier for the language model in use”, because this may exclude some users who would otherwise not have been excluded. This should be considered a non-goal. It might be acceptable to indicate particular model capabilities instead of “brand names”.

If open-weight models turn out to have insufficient quality compared to commercial models, the cost of licensing the commercial models could prevent some independent browsers from shipping a competitive implementation of this API. There's disagreement within the TAG about whether that's a reasonable outcome.

## **Interoperable model parameters**

While Prompt API tries to provide an abstract API across language models, it currently only supports the `topK` and `temperature` parameters, leading to potential interop issues (e.g., if top-p sampling should be used).

There should be a concept of how other sampling parameters could be supported, or we suggest removing them altogether until they can be implemented in an interoperable manner.

## **Cost of computing**

We have a concern that authors could use this API to offload the “cost of computing” to local devices and end users. The TAG is ambivalent about whether this is appropriate: if wealthy sites that users are already paying for, perhaps by watching ads, also offload their compute to users, that seems inappropriate. If hobby sites with no revenue ask their users to provide their own computation, that seems appropriate. This sort of cost transfer is already possible with existing JS, Wasm, and WebGPU APIs, but the size of local devices has always limited the amount the user could contribute. With a local-only Prompt API, that’s still true, but as this API also envisions allowing the computation to run in the cloud, wealthy users might have a lot more resources to offer. Of course, wealthy users can also pay sites directly, but abusing their AI subscription might act as a form of subtle (and inefficient) micropayment, and so be easier to get them to accept. Have you thought about ways to ensure users have the leverage to push back against these forms of abuse?

With respect to abuse of local computation (similar to cryptomining), the mitigations are likely to be shared with JS, Wasm, and WebGPU, so it would be fine for the Prompt explainer and spec to just refer to some existing explanation of the mitigations used for those APIs. We see @domenic’s mention of [#security-runtime](https://webmachinelearning.github.io/writing-assistance-apis/#security-runtime) in reply to the earlier comment about “Potential for computation abuse”, but that section addresses interference with parallel uses of the Prompt API, rather than taking value from the user by running computations on their device. We also don't think permissions policy is enough, since the top-level site can also abuse the local resources or grant permission to a third-party that it shouldn't trust.

## **Models are assumed to execute locally**

We see that hybrid or remote inference [are design goals](https://github.com/w3ctag/design-reviews/issues/1093#issuecomment-3470598937). For example, user agents or the underlying platform could decide to process a prompt on a server based on the prompt's complexity, device capabilities, resource consumption, or environmental conditions.

However, the API only incorporates the complexity of downloading local models. We'd like you to pay more attention to the complexity of running inference in the cloud. One big example is that cloud models are often subscription services, and so the user needs to be in charge of how much of their quota the site gets to use. The location of inference (local, edge, cloud) could also have a significant impact on response quality and privacy, and sites may need to be able to respond to those differences. Please look into what API changes will be needed to fully support hybrid and cloud-based execution.

## **Model acquisition**

The API often appears to be designed with the assumption that a model needs to be downloaded as a browser-wide component before first use, possibly depending on the expected input and output language. This leads to a rather unusual asynchronous instantiation via `LanguageModel.create()`.

However, the user agent or the underlying platform may choose to utilize a model that is already present on the device, manage model downloads separately, or send the prompt to a cloud model for processing. In this case, the model acquisition architecture should be left as an implementation detail and not exposed to scripts.

Put differently, we feel that it shouldn’t be possible for a website to query the existence or monitor the download of a browser-wide component, as this might allow cross-site collaboration attacks (e.g., two sites could use the download progress to identify a user uniquely). Instead, the browser could prompt for downloading (“This site wants to download a 4 GB model onto your device. After download, you will have N gigabytes left. Allow?”) and report the download progress using its own UI.

The developer may not be in the position to make a determination when to download a model. For example, the user might be on a fast network with unlimited quota and a high-end device and with no concerns about download quotas. It is not for the site to make any determinations about the user's device, capabilities, or environment as otherwise it violates the principle of "one web" for everyone.

Please consider the gamut of possible users of this technology: from low-end devices, to users who are traveling, those on limited quotas, etc. and leave the user in control, while enabling the largest set of use cases.

## **Model versions and updates**

As noted in the Privacy Considerations section of the Writing Assistance API, models on a single browser might have different versions, leading to a fingerprinting vector based on what capabilities are available to a model (e.g., Model-V4 vs ModelV5 vs Model-1.2 or whatever). It is great to see these concerns identified in the Privacy Considerations section of the Writing Assistance API. Consider how not exposing anything about the underlying implementation might address some of the concerns. Similarly, different versions could lead to different results based on available capabilities.

We're worried that models could be updated at some unreasonable periodic cadence, meaning potentially gigabytes of additional downloads per update cadence (e.g., once a week, once a month). Please consider the naive user agent implementations that could have implications on the user's phone plan and available storage.

We acknowledge that it might be impossible to prevent a website from identifying different model versions by poking at the model, but it would be good to consider if completely obscuring if a model is on device or not might help here.

We're concerned that pages might get confused if the model updates between page loads. Can you think about how pages might get consistent results over time?

## **Expected input/output languages**

> This allows the implementation to download any necessary supporting material.

We’re worried about the risks of fingerprinting and excessive disk usage from letting the site select an arbitrary set of languages.

We tentatively believe that local models don’t need to be able to understand or produce languages outside the set that the user understands, and possibly the language the page is written in. That set could be downloaded with the main model without allowing the `LanguageModel.create()` call to control the download—and potentially fingerprint the user’s browsing history—by picking different expected input and output languages. If we allow a site to use its own language, it could still probe by setting the page’s language differently on each load, but that’s much more visible. Is there a reason we shouldn’t believe this? If so, please add it to the explainer’s alternatives considered.

## **System prompts**

If the role of “system” must be the first element in the initialPrompts array, maybe systemPrompt should be its own top-level member. That may give more flexibility, and removes that particular error case.

## **Quota vs. context window**

We feel the term “quota” is misleading. It seems to refer to the *context window* of the model, but could be confused with either storage quota (which typically triggers `QuotaExceededError`s) or a consumption quota: As models are expensive to run, it would also be reasonable to limit the amount of compute a site can use. We suggest using a different term here.

## **Memory management**

> The ability to manually destroy a session allows applications to free up memory without waiting for garbage collection, which can be useful since language models can be quite large.

The `destroy()` method seems like it would put the developer in charge of memory management. Yes, the models can be large, and use up a lot of memory. That developers can get into this situation and are forced to manage memory seems counter to how we design web APIs. Developers purposely lack a complete view of the overall state of the underlying system, how much memory a tab is using, etc. We encourage rethinking the architecture of the API so that memory management is left up to the user agent and underlying platform \--- and not left to the developer to manage.

In addition, having multiple ways to destroy a session seems redundant. Please pick one (probably AbortController, it provides a lot more functionality than `destroy()`, which seems to just be syntactic sugar). Does it make sense to use this with JS's new "[using](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/using)" declaration and `[Symbol.dispose]()`.

## **JSON Schema**

While JSON Schema is the de facto standard for describing available tools and expected output when dealing with language models, this would be the first time this format is used at web-platform scale. User agents and operating systems may need to implement JSON Schema support first, which could have unforeseen security implications and may create new and unforeseen attack vectors. Please consider the long-term implications of this choice. Will it work across all platforms? What additional costs will be incurred by having a dependency on JSON Schema?

Please note that JSON Schema hasn't been formally standardized and there seems to be various dialects (see [https://json-schema.org/tools](https://json-schema.org/tools)). How does the community plan to deal with that when standardizing?

## **Tool use**

We suggest elaborating on the Tool use section a bit more. For example, it’s not clear *how* the session model understands to pick/use the "getWeather" tool from the list of tools or if it should fall back to the underlying model. Consider giving some clarification on how/when the model decides to use a tool and how it makes that determination (if only because this is non-obvious for those without significant expertise in the area). Is the tool use behavior consistent across models? What if it doesn’t pick the tool and it hallucinates the weather instead? It would be great to clarify if the model is expected to ingest the JavaScript of the tool to support making the determination to use the tool.

## **Structured output**

It’s not clear whether models, underlying platforms, and frameworks will interoperably support the JavaScript flavor of Regular Expressions and JSON Schemas. Was that considered and validated when making this choice? We caution that it may be premature to settle on JSON Schema or JS RegEx as the format here out of an abundance of caution and interoperability concerns.

If the responses need to be processed anyway, why not do the schema and RegEx checks in JS (while acknowledging this may reduce developer ergonomics)?

The `omitResponseConstraintInput` and `responseConstraint`, plus adding constraints to the prompt might have unintended consequences. The fact that you can mix response guidance into the prompt itself may end up confusing the model in contradictory ways… what if the user or developer also asks for other formats themselves?

--
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/1093#issuecomment-3515070512
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/1093/3515070512@github.com>

Received on Tuesday, 11 November 2025 05:35:26 UTC