- From: Dominic Farolino <notifications@github.com>
- Date: Thu, 02 Apr 2026 09:57:08 -0700
- To: w3ctag/design-reviews <design-reviews@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <w3ctag/design-reviews/issues/1198/4179176302@github.com>
domfarolino left a comment (w3ctag/design-reviews#1198) > What do you think? Personally, I think this is getting tedious. I'd like to see a broader, meta discussion with the TAG on how we use AI to evaluate web platform proposals. Some of the questions in your input prompt are reasonable, and others aren't, but we shouldn't be litigating the evaluation criteria each time a TAG member creates an elaborate one-off prompt. If we want to do this for real, maybe the TAG can publish a few `SKILL.md` files and curated prompts with review criteria that the community votes on or generally agrees on, so that we have a sense of the deterministic criteria we're telling LLMs to judge with. This is as opposed to using a likely-AI-generated input prompt that we've never seen before, with a questionable pass/fail structure. Your prompt doesn't tell Gemini to evaluate the proposal objectively, you tell it to find problems and be critical. Today, LLMs are suggestible enough to where they'll satisfy what they think their user wants more than responsibly apply the objective criteria you might have intended. It's like asking a ghost hunter to come to your house and look around. Do you think they're *not* going to find ghosts? Please spend some human time evaluating this, even if it results in short, specific comments like https://github.com/w3ctag/design-reviews/issues/1198#issuecomment-4166581568, which I agree points out a very reasonable concern. ---- > @domfarolino do you have your prompt still for assessing the assessment? That’s key. Edited the comment to append the input prompt. -- Reply to this email directly or view it on GitHub: https://github.com/w3ctag/design-reviews/issues/1198#issuecomment-4179176302 You are receiving this because you are subscribed to this thread. Message ID: <w3ctag/design-reviews/issues/1198/4179176302@github.com>
Received on Thursday, 2 April 2026 16:57:11 UTC