Re: [w3ctag/design-reviews] Prompt API (Issue #1093) from Marcos Cáceres on 2025-08-26 (public-webapps-github@w3.org from August 2025)

From: Marcos Cáceres <notifications@github.com>
Date: Mon, 25 Aug 2025 20:18:20 -0700
To: w3ctag/design-reviews <design-reviews@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <w3ctag/design-reviews/issues/1093/3222435561@github.com>
marcoscaceres left a comment (w3ctag/design-reviews#1093)

Hi Domenic,
The TAG acknowledges the interest from implementers and developers in enabling web applications to integrate with generative models through prompting. However, we have significant concerns about the current level of specification detail and its implications for interoperability, security, and long-term sustainability of the platform. We believe the following concerns apply equally to the Writing Assistance APIs, which share many of the same underlying assumptions/models.

Despite what is mandated in the explainer, we have a concern that local models won't meet user and developer and result expectations, and could be severely limited on low-end devices. This problem could get worse over time, as users/developers become increasingly accustomed to crafting extremely complex prompts by today's standards. And if it's just too presumptuous to assume local models will stay "local", given the trend towards hybrid models that send off complicated prompts to servers (even in a privacy preserving way). As noted in the explainer:

> “We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers. ... These are left as quality-of-implementation issues.”  
> — [Explainer: Goals](https://github.com/webmachinelearning/prompt-api/blob/main/README.md#goals)

This is a notable departure on the web platform, where developers expect baseline interoperable behavior across user agents. At present, the API lacks a minimal interoperable model contract, has no reliable capability detection or quality signals, and makes assumptions (such as on-device execution or reliable structured output) that cannot be guaranteed across implementations. And given that models do computational work, there's a real threat of distributed computation abuse by exploiting this local computational resource at scale at the cost of users (i.e., answering other people's prompts on a user's device without the user knowing). 

We believe further work is needed, particularly in defining testable guarantees, privacy, security, and data governance requirements, and an architectural fit with other emerging platform capabilities, before something like this should become part of the Web platform.

Detailed concerns:

### 1. Interoperability & Testability
- **No shared baseline**: As acknowledged in the explainer, the API does not guarantee any consistent model behavior or quality across browsers. This undermines interoperability and prevents developers from writing portable, testable code.
- **Opaque model selection**: There is no defined way for developers to know what model is in use, making debugging and conformance testing infeasible and errors often impossible to reproduce.

### 2. Architecture & Web Principles
- **Procedural and nondeterministic**: The API departs from predictable, repeatable behavior, though we acknowledge that in some context this might not matter. 
- **Difficult to polyfill**: The explainer leaves open whether fallback to wasm/cloud models is even viable. In practice, the lack of standard behavior makes polyfilling impractical (again, because there is no real baseline - it's just whatever media the model gives developers back).
- **Threat model missing**: Security concerns such as prompt injection, history leakage, and cross-origin contamination are not clearly addressed.

### 3. Assumptions That May Not Hold
- **On-device vs remote execution**: The explainer lists "execution location transparency" as a possible future goal, but not a guarantee:

  > “It may be desirable to offer more insight into whether a model is executing locally or remotely (e.g. to inform UX or data governance decisions).”  
  > — [README.md § Goals: Execution location transparency](https://github.com/webmachinelearning/prompt-api/blob/main/README.md#execution-location-transparency)

- **Structured output is unreliable**: While structured outputs are discussed as desirable, they are not required (making this a risky feature that could negatively impact users and their ability even use a web page):

  > “Language models are not guaranteed to produce structurally valid results; efforts to constrain output structure using techniques like prompt templating may be employed...”  
  > — [README.md § Prompt lifecycle](https://github.com/webmachinelearning/prompt-api/blob/main/README.md#prompt-lifecycle)

- **“Tools” abstraction**: No such abstraction is described in the explainer. We have concerns about this proposal being overly tied to vendor-specific use cases.

### 4. Privacy & Security
- **Data governance can be challenging**: The explainer notes that training on user data must be prevented, but, even if not used for training, the data is could be stored, transmitted, or retained. This might not be possible because the prompt may be handed off to a service that that does retain data or train the model by default (or by ULA) - or the model has been pre-tuned on purpose by the users.

  > “The platform MUST prevent user input from being used to fine-tune models or otherwise persist and train on user prompts.”  
  > — [Security & Privacy Questionnaire](https://github.com/webmachinelearning/prompt-api/blob/main/security-privacy-questionnaire.md#4-privacy-considerations)

  However, there is no mention of how this is enforced, nor of retention policies, which remains a concern.

- **Potential for computation abuse**: There are no guardrails discussed around background or opportunistic use of models, which could lead to battery or CPU exhaustion similar to past abuses (e.g. crypto-mining).

- **Fingerprinting risk**: Even through the explainer demands local model (again, no guaranteed), prompt content can encode sensitive user data or preferences, introducing new unintentional vectors for surveillance or tracking.

### 5. Developer Experience & UX
- **Inconsistent outputs**: Developers cannot expect uniform results across UAs due to model divergence.
- **No quality signals**: There is no standard mechanism to detect hallucinations or confidence levels.
- **No schema support**: Inputs and outputs remain freeform text, which makes robust integration difficult.

### 6. Ecosystem & Governance
- **Lack of accountability**: With no shared test suite or conformance requirements, there is no path for developers to hold implementations to a common standard.
- **Unclear relationship to adjacent specs**: The explainer does not clarify how this proposal relates to WebNN, WebGPU, or potential future APIs for structured model execution.

We encourage continued exploration of this space, but recommend the following before the API progresses further:

1. Define a minimal, testable capability baseline that all conforming implementations must meet.  
2. Provide reliable capability detection and developer-facing signals for output quality or confidence.  
3. Establish a clear threat model and data governance framework, including retention and training boundaries.  
4. Clarify the API’s relationship with adjacent emerging capabilities and ensure it fits within the layered architecture of the web platform.
5. Propose some means to prevent distributed abuse of this computational resource.  

The TAG would welcome re-review as these aspects evolve.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/1093#issuecomment-3222435561
You are receiving this because you are subscribed to this thread.

Message ID: <w3ctag/design-reviews/issues/1093/3222435561@github.com>
Received on Tuesday, 26 August 2025 03:18:24 UTC