- From: Tom Jones <thomasclinganjones@gmail.com>
- Date: Sat, 20 Dec 2025 19:20:16 -0800
- To: Hans Petter Blindheim <hans.petter.blindheim@gmail.com>
- Cc: dylan larson <dylanl37@hotmail.com>, Warren Parad <wparad@rhosys.ch>, Daniel Vinci <me@danielvinci.com>, Jason Grigsby <jason@cloudfour.com>, g b <bgauryy@gmail.com>, "public-wicg@w3.org" <public-wicg@w3.org>
- Message-ID: <CAK2Cwb6yeFRYWuBEomYS_dNYp8H0vGXSi36T8FgVLr2McgfGcw@mail.gmail.com>
forgot the link threat-modeling/models/Threat Model WellKnown Endpoints.md at main · TomCJones/threat-modeling <https://github.com/TomCJones/threat-modeling/blob/main/models/Threat%20Model%20WellKnown%20Endpoints.md> Peace ..tom jones On Sat, Dec 20, 2025 at 6:11 PM Tom Jones <thomasclinganjones@gmail.com> wrote: > here is a treat model for well-known endpoints that i expect to submit to > w3c threat modeling group soon. > Would like to know if it is of help in this discussion. > feedback welcomed. > Peace ..tom jones > > > On Sat, Dec 20, 2025 at 6:46 AM Hans Petter Blindheim < > hans.petter.blindheim@gmail.com> wrote: > >> Hi Dylan, >> >> Thank you for the explanation. Also found the post and reviewed the >> suggested standard. >> >> As it stands, I am hesitant to endorse this (but to be transparent, I do >> not think I carry any meaningful weight in that regard). Will go over why I >> take this stance, because I think it might help others, and it also allows >> you (or them) to prove my reasoning as wrong. >> >> First off, I believe we would get better results by focusing on improving >> existing standards. Two examples of such improvements: >> >> >> 1. Evolving robots.txt: Updating it to address bots by purpose >> instead of handles would give domain owners the control they currently lack >> in a bot-landscape that already is impossible to manage manually >> 2. Enhancing sitemap.xml: Supporting page titles and schema[.]org >> markups directly in sitemaps would assist AI and search engines index >> domains more accurately (and quickly) >> >> These shifts would likely be more effective than the AIDD approach (also >> llms..txt for that matter, which I find very similar to your suggested >> standard), providing better support for both major players and new >> RAG-based competitors. >> >> To expand on why I'm hesitant to the AIDD suggested standard: >> >> - AIDD doesn't seem to offer clear benefits beyond what the suggested >> llms.txt standard already provides, nor existing adaptation of schema[.]org >> does >> - This format offers a high risk of poisoning/manipulation. Examples >> includes adult content cloaked to appear in AI results for sensitive >> queries (like school assignments), or alternative health remedies on >> illness related queries >> - It would require adoption from both site owners and AI developers, >> yet there’s currently no indication of support from major players like >> Google, OpenAI, or Anthropic >> >> - >> >> Best >> >> Hans Petter Blindheim >> >> >> >> On Thu, Dec 18, 2025 at 3:01 PM dylan larson <dylanl37@hotmail.com> >> wrote: >> >>> Thanks for feedback and aolid point and concern you bring up! >>> >>> AIDD is not attempting to replace the scope of what registrars perform, >>> nor is it being totally redundant in that area. >>> >>> >>> Registrar data establishes who owns a domain. AIDD establishes what the >>> domain claims to represent. >>> >>> >>> AI systems already know how to reason about ownership and abuse. What >>> they lack is a reliable, first-party declaration of semantic scope at the >>> domain level. >>> >>> On Dec 18, 2025, at 3:38 AM, Hans Petter Blindheim < >>> hans.petter.blindheim@gmail.com> wrote: >>> >>> >>> Hi, >>> >>> First off, I'm not sure why I landed in this email group. But I have >>> shown some interest in regulating AI (suggested that the robots.txt >>> protocol ought to adress bots overall purpose rather than their handles - >>> with some level of snippet-control). >>> >>> Might have misunderstood the intent a bit here, so appologies if I have. >>> But from my understanding of this, why not put it to the registrars for the >>> domains instead? >>> There the domain is either privately held (so, no company entity >>> applicable) or publicly held (company, government) and tied to official >>> databases related to that. >>> >>> Think this is already broadly in place (whois lookups for a lot of >>> domains has had this information at least), and if the registrar >>> information (which should be available for all domains?) then links to the >>> official company database used on a per-country basis, then their entries >>> could in turn perhaps be applied towards some standard of business >>> authority(?). >>> >>> - >>> >>> Best, >>> >>> Hans Petter Blindheim >>> >>> >>> >>> On Thu, Dec 18, 2025 at 8:42 AM Warren Parad <wparad@rhosys.ch> wrote: >>> >>>> Nothing, but they are already doing that. The only difference here is >>>> that this is just a programmatic replacement for the landing page for >>>> websites. Instead of loading the landing page and reading html, someone >>>> wants to lead the landing page and read json. >>>> >>>> But for this to be valuable, I think we would need convincing that LLMs >>>> are going to magically switch to reading o different page, why would they >>>> do that? >>>> >>>> On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote: >>>> >>>>> What's preventing a domain from misrepresenting the products/services >>>>> it offers to drive engagement? >>>>> >>>>> On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote: >>>>> >>>>> >>>>> An example here is this. >>>>> >>>>> Say AI company/agent/etc. partially scraped a web site and got some >>>>> pages, but not all, or third-party l aggregator provided some but not all >>>>> pages. >>>>> >>>>> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a >>>>> hardware store providing outdoor equipment, farming equipment, lumber etc. >>>>> >>>>> The pages scraped/data provided to the AI/agent were only service >>>>> area, home page, partners page, and lawnmower pages. >>>>> >>>>> When queried about “local hardware/outdoor stores, it may be passed up >>>>> because it is believed to be a lawnmower sales store. >>>>> >>>>> Additionally, an internal to AI/agent algorithm could have ranked/gave >>>>> more weight to the partners page and pass one of Bob’s Outdoor >>>>> Equipment LLC’s partners contact information due to it being on a more >>>>> authoritative page. >>>>> >>>>> That is why the decision was made to make this a domain level >>>>> authority. >>>>> >>>>> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour..com >>>>> <jason@cloudfour.com>> wrote: >>>>> >>>>> >>>>> Looking at the examples, how does one sentence of description prevent >>>>> AI from misrepresenting a company's many products? Providing a >>>>> JSON-LD-style description of an entire domain isn’t going to prevent an AI >>>>> from saying my product supports features it doesn’t or providing poor >>>>> comparisons to competitors. >>>>> >>>>> I'd love to hear a real-life example of how this impacted an >>>>> organization. For example, a customer of ACME company asked AI engine X >>>>> this question: "[insert here]". The answer it received was "foo.." It >>>>> should have been "bar." If AIDD was in place, this wouldn't have happened >>>>> because AI could use AIDD in this way. >>>>> >>>>> It's not that I don't think AI making things up about organizations >>>>> and their sites isn't a problem. I just have a hard time understanding how >>>>> this solves it. >>>>> >>>>> >>>>> >>>>> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | >>>>> @grigs >>>>> >>>>> >>>>> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> >>>>> wrote: >>>>> >>>>> Thanks for your all’s feedback! This concern/issue had come up in >>>>> early discussions during early development. >>>>> >>>>> AIDD uses the same definition of “authoritative” that existing Web >>>>> standards rely on: >>>>> >>>>> Information is authoritative for the domain that publishes it. >>>>> >>>>> >>>>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or >>>>> schema.org JSON-LD, attackers can copy content but cannot publish it >>>>> from the legitimate domain. >>>>> >>>>> >>>>> A phishing domain might mimic the profile of a well-known company, but >>>>> that profile would be authoritative only for the phishing domain, but never >>>>> for the real one. >>>>> >>>>> >>>>> With that’s said, AIDD is still in its early stages, and AIDD is >>>>> deliberately minimal so that additional identity-verification strategies >>>>> can evolve independently. Two paths already exist naturally, DNSSEC & >>>>> third-party verification. >>>>> >>>>> >>>>> Again thanks, and look forward to further feedback. >>>>> >>>>> >>>>> >>>>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote: >>>>> >>>>> >>>>> Exactly what I had in mind. >>>>> It could be another layer that attackers would try to abuse. >>>>> While domain names (and URLs) are deterministic, metadata could be a >>>>> great place to manipulate AI models. >>>>> Im not sure there's a way to protect models from being manipulated and >>>>> also, it would require subdomain rules, which it's another layer that >>>>> should be taken into consideration (for data integrity and security). >>>>> Last - it should be cached but also need to be able to be purged (for >>>>> example when the domain is changing its purpose or owner) >>>>> >>>>> >>>>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote: >>>>> >>>>> I'm concerned that malicious attackers will use this strategy to >>>>> better phish users by publishing a domain profile that exactly matches well >>>>> known companies. How can we ensure that the information here is actually >>>>> trustworthy? >>>>> >>>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> >>>>> wrote: >>>>> >>>>> Hello WICG community, >>>>> >>>>> >>>>> >>>>> I would like to introduce the AI Domain Data Standard (AIDD) for >>>>> discussion. Its goal is to address a gap in the web ecosystem that is >>>>> becoming more visible as AI systems increasingly act as intermediaries >>>>> between users and websites. >>>>> >>>>> >>>>> >>>>> *Problem* >>>>> >>>>> AI assistants often misidentify or misrepresent domains because there >>>>> is no consistent, machine-readable, domain-controlled source of identity >>>>> data.. Today, models rely on scraped pages, inconsistent metadata, >>>>> third-party aggregators, or outdated indexes. There is no canonical place >>>>> where a domain can declare who they are, what they represent, or which >>>>> resources are authoritative. >>>>> >>>>> >>>>> >>>>> *Proposal* >>>>> >>>>> AIDD defines a small, predictable JSON document served from: >>>>> >>>>> • https://<domain>/.well-known/domain-profile.json >>>>> • Optional fallback: _ai.<domain> TXT record containing a >>>>> base64-encoded JSON copy >>>>> >>>>> The format contains required identity fields (name, description, >>>>> website, contact) and optional schema.org-aligned fields such as entity >>>>> type, logo, and JSON-LD. The schema is intentionally minimal to ensure >>>>> predictable consumption by AI systems, agents, crawlers, and other >>>>> automated clients. >>>>> >>>>> >>>>> >>>>> *Specification (v0.1.1):* >>>>> https://ai-domain-data.org/spec/v0.1 >>>>> >>>>> >>>>> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json >>>>> >>>>> >>>>> >>>>> *Design Principles* >>>>> >>>>> • Self-hosted and vendor-neutral >>>>> • Aligns with schema.org vocabulary >>>>> • Minimal surface area with clear versioning >>>>> • Follows existing web conventions for .well-known/ >>>>> • Supports both HTTPS and DNS TXT discovery >>>>> >>>>> >>>>> >>>>> *Early Adoption & Tooling* >>>>> >>>>> - CLI validator and generator >>>>> - Resolver SDK >>>>> - Next.js integration >>>>> - Jekyll plugin >>>>> - WordPress plugin (submitted) >>>>> - Online generator and checker tools >>>>> >>>>> >>>>> >>>>> *Repository:* >>>>> https://github.com/ai-domain-data/spec >>>>> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com> >>>>> >>>>> >>>>> >>>>> *Questions for the community* >>>>> >>>>> 1. Should this pursue formal standardization (W3C, IETF) or remain >>>>> a community-driven specification >>>>> 2. Are the discovery mechanisms (.well-known + DNS TXT fallback) >>>>> appropriate for long-term stability >>>>> 3. What extension patterns are advisable while preserving strict >>>>> predictability >>>>> 4. Should browsers or other user agents eventually consume this >>>>> data >>>>> 5. Are there concerns around naming (domain-profile.json) that the >>>>> group would recommend addressing early >>>>> >>>>> >>>>> >>>>> *Explainer* >>>>> >>>>> A more complete explainer is available here: >>>>> https://ai-domain-data.org/spec/v0.1 >>>>> >>>>> I would appreciate any feedback from the WICG community on scope, >>>>> technical direction, and whether this fits the criteria for incubation. >>>>> >>>>> Best regards, >>>>> Dylan Larson >>>>> >>>>> >>>>> Daniel Vinci >>>>> em: me@danielvinci.com >>>>> mx: @xylobol:amber.tel >>>>> >>>>>
Received on Sunday, 21 December 2025 03:20:33 UTC