- From: Tom Jones <thomasclinganjones@gmail.com>
- Date: Sat, 20 Dec 2025 18:11:13 -0800
- To: Hans Petter Blindheim <hans.petter.blindheim@gmail.com>
- Cc: dylan larson <dylanl37@hotmail.com>, Warren Parad <wparad@rhosys.ch>, Daniel Vinci <me@danielvinci.com>, Jason Grigsby <jason@cloudfour.com>, g b <bgauryy@gmail.com>, "public-wicg@w3.org" <public-wicg@w3.org>
- Message-ID: <CAK2Cwb6WSy=PCwHQrbXvwiEqBYLy0Q01RKQNNMWRvBR77JpfAA@mail.gmail.com>
here is a treat model for well-known endpoints that i expect to submit to w3c threat modeling group soon. Would like to know if it is of help in this discussion. feedback welcomed. Peace ..tom jones On Sat, Dec 20, 2025 at 6:46 AM Hans Petter Blindheim < hans.petter.blindheim@gmail.com> wrote: > Hi Dylan, > > Thank you for the explanation. Also found the post and reviewed the > suggested standard. > > As it stands, I am hesitant to endorse this (but to be transparent, I do > not think I carry any meaningful weight in that regard). Will go over why I > take this stance, because I think it might help others, and it also allows > you (or them) to prove my reasoning as wrong. > > First off, I believe we would get better results by focusing on improving > existing standards. Two examples of such improvements: > > > 1. Evolving robots.txt: Updating it to address bots by purpose instead > of handles would give domain owners the control they currently lack in a > bot-landscape that already is impossible to manage manually > 2. Enhancing sitemap.xml: Supporting page titles and schema[.]org > markups directly in sitemaps would assist AI and search engines index > domains more accurately (and quickly) > > These shifts would likely be more effective than the AIDD approach (also > llms..txt for that matter, which I find very similar to your suggested > standard), providing better support for both major players and new > RAG-based competitors. > > To expand on why I'm hesitant to the AIDD suggested standard: > > - AIDD doesn't seem to offer clear benefits beyond what the suggested > llms.txt standard already provides, nor existing adaptation of schema[.]org > does > - This format offers a high risk of poisoning/manipulation. Examples > includes adult content cloaked to appear in AI results for sensitive > queries (like school assignments), or alternative health remedies on > illness related queries > - It would require adoption from both site owners and AI developers, > yet there’s currently no indication of support from major players like > Google, OpenAI, or Anthropic > > - > > Best > > Hans Petter Blindheim > > > > On Thu, Dec 18, 2025 at 3:01 PM dylan larson <dylanl37@hotmail.com> wrote: > >> Thanks for feedback and aolid point and concern you bring up! >> >> AIDD is not attempting to replace the scope of what registrars perform, >> nor is it being totally redundant in that area. >> >> >> Registrar data establishes who owns a domain. AIDD establishes what the >> domain claims to represent. >> >> >> AI systems already know how to reason about ownership and abuse. What >> they lack is a reliable, first-party declaration of semantic scope at the >> domain level. >> >> On Dec 18, 2025, at 3:38 AM, Hans Petter Blindheim < >> hans.petter.blindheim@gmail.com> wrote: >> >> >> Hi, >> >> First off, I'm not sure why I landed in this email group. But I have >> shown some interest in regulating AI (suggested that the robots.txt >> protocol ought to adress bots overall purpose rather than their handles - >> with some level of snippet-control). >> >> Might have misunderstood the intent a bit here, so appologies if I have. >> But from my understanding of this, why not put it to the registrars for the >> domains instead? >> There the domain is either privately held (so, no company entity >> applicable) or publicly held (company, government) and tied to official >> databases related to that. >> >> Think this is already broadly in place (whois lookups for a lot of >> domains has had this information at least), and if the registrar >> information (which should be available for all domains?) then links to the >> official company database used on a per-country basis, then their entries >> could in turn perhaps be applied towards some standard of business >> authority(?). >> >> - >> >> Best, >> >> Hans Petter Blindheim >> >> >> >> On Thu, Dec 18, 2025 at 8:42 AM Warren Parad <wparad@rhosys.ch> wrote: >> >>> Nothing, but they are already doing that. The only difference here is >>> that this is just a programmatic replacement for the landing page for >>> websites. Instead of loading the landing page and reading html, someone >>> wants to lead the landing page and read json. >>> >>> But for this to be valuable, I think we would need convincing that LLMs >>> are going to magically switch to reading o different page, why would they >>> do that? >>> >>> On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote: >>> >>>> What's preventing a domain from misrepresenting the products/services >>>> it offers to drive engagement? >>>> >>>> On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote: >>>> >>>> >>>> An example here is this. >>>> >>>> Say AI company/agent/etc. partially scraped a web site and got some >>>> pages, but not all, or third-party l aggregator provided some but not all >>>> pages. >>>> >>>> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a >>>> hardware store providing outdoor equipment, farming equipment, lumber etc. >>>> >>>> The pages scraped/data provided to the AI/agent were only service area, >>>> home page, partners page, and lawnmower pages. >>>> >>>> When queried about “local hardware/outdoor stores, it may be passed up >>>> because it is believed to be a lawnmower sales store. >>>> >>>> Additionally, an internal to AI/agent algorithm could have ranked/gave >>>> more weight to the partners page and pass one of Bob’s Outdoor >>>> Equipment LLC’s partners contact information due to it being on a more >>>> authoritative page. >>>> >>>> That is why the decision was made to make this a domain level >>>> authority. >>>> >>>> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour..com >>>> <jason@cloudfour.com>> wrote: >>>> >>>> >>>> Looking at the examples, how does one sentence of description prevent >>>> AI from misrepresenting a company's many products? Providing a >>>> JSON-LD-style description of an entire domain isn’t going to prevent an AI >>>> from saying my product supports features it doesn’t or providing poor >>>> comparisons to competitors. >>>> >>>> I'd love to hear a real-life example of how this impacted an >>>> organization. For example, a customer of ACME company asked AI engine X >>>> this question: "[insert here]". The answer it received was "foo.." It >>>> should have been "bar." If AIDD was in place, this wouldn't have happened >>>> because AI could use AIDD in this way. >>>> >>>> It's not that I don't think AI making things up about organizations and >>>> their sites isn't a problem. I just have a hard time understanding how this >>>> solves it. >>>> >>>> >>>> >>>> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | >>>> @grigs >>>> >>>> >>>> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> >>>> wrote: >>>> >>>> Thanks for your all’s feedback! This concern/issue had come up in early >>>> discussions during early development. >>>> >>>> AIDD uses the same definition of “authoritative” that existing Web >>>> standards rely on: >>>> >>>> Information is authoritative for the domain that publishes it. >>>> >>>> >>>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or >>>> schema.org JSON-LD, attackers can copy content but cannot publish it >>>> from the legitimate domain. >>>> >>>> >>>> A phishing domain might mimic the profile of a well-known company, but >>>> that profile would be authoritative only for the phishing domain, but never >>>> for the real one. >>>> >>>> >>>> With that’s said, AIDD is still in its early stages, and AIDD is >>>> deliberately minimal so that additional identity-verification strategies >>>> can evolve independently. Two paths already exist naturally, DNSSEC & >>>> third-party verification. >>>> >>>> >>>> Again thanks, and look forward to further feedback. >>>> >>>> >>>> >>>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote: >>>> >>>> >>>> Exactly what I had in mind. >>>> It could be another layer that attackers would try to abuse. >>>> While domain names (and URLs) are deterministic, metadata could be a >>>> great place to manipulate AI models. >>>> Im not sure there's a way to protect models from being manipulated and >>>> also, it would require subdomain rules, which it's another layer that >>>> should be taken into consideration (for data integrity and security). >>>> Last - it should be cached but also need to be able to be purged (for >>>> example when the domain is changing its purpose or owner) >>>> >>>> >>>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote: >>>> >>>> I'm concerned that malicious attackers will use this strategy to better >>>> phish users by publishing a domain profile that exactly matches well known >>>> companies. How can we ensure that the information here is actually >>>> trustworthy? >>>> >>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> >>>> wrote: >>>> >>>> Hello WICG community, >>>> >>>> >>>> >>>> I would like to introduce the AI Domain Data Standard (AIDD) for >>>> discussion. Its goal is to address a gap in the web ecosystem that is >>>> becoming more visible as AI systems increasingly act as intermediaries >>>> between users and websites. >>>> >>>> >>>> >>>> *Problem* >>>> >>>> AI assistants often misidentify or misrepresent domains because there >>>> is no consistent, machine-readable, domain-controlled source of identity >>>> data.. Today, models rely on scraped pages, inconsistent metadata, >>>> third-party aggregators, or outdated indexes. There is no canonical place >>>> where a domain can declare who they are, what they represent, or which >>>> resources are authoritative. >>>> >>>> >>>> >>>> *Proposal* >>>> >>>> AIDD defines a small, predictable JSON document served from: >>>> >>>> • https://<domain>/.well-known/domain-profile.json >>>> • Optional fallback: _ai.<domain> TXT record containing a >>>> base64-encoded JSON copy >>>> >>>> The format contains required identity fields (name, description, >>>> website, contact) and optional schema.org-aligned fields such as entity >>>> type, logo, and JSON-LD. The schema is intentionally minimal to ensure >>>> predictable consumption by AI systems, agents, crawlers, and other >>>> automated clients. >>>> >>>> >>>> >>>> *Specification (v0.1.1):* >>>> https://ai-domain-data.org/spec/v0.1 >>>> >>>> >>>> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json >>>> >>>> >>>> >>>> *Design Principles* >>>> >>>> • Self-hosted and vendor-neutral >>>> • Aligns with schema.org vocabulary >>>> • Minimal surface area with clear versioning >>>> • Follows existing web conventions for .well-known/ >>>> • Supports both HTTPS and DNS TXT discovery >>>> >>>> >>>> >>>> *Early Adoption & Tooling* >>>> >>>> - CLI validator and generator >>>> - Resolver SDK >>>> - Next.js integration >>>> - Jekyll plugin >>>> - WordPress plugin (submitted) >>>> - Online generator and checker tools >>>> >>>> >>>> >>>> *Repository:* >>>> https://github.com/ai-domain-data/spec >>>> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com> >>>> >>>> >>>> >>>> *Questions for the community* >>>> >>>> 1. Should this pursue formal standardization (W3C, IETF) or remain >>>> a community-driven specification >>>> 2. Are the discovery mechanisms (.well-known + DNS TXT fallback) >>>> appropriate for long-term stability >>>> 3. What extension patterns are advisable while preserving strict >>>> predictability >>>> 4. Should browsers or other user agents eventually consume this data >>>> 5. Are there concerns around naming (domain-profile.json) that the >>>> group would recommend addressing early >>>> >>>> >>>> >>>> *Explainer* >>>> >>>> A more complete explainer is available here: >>>> https://ai-domain-data.org/spec/v0.1 >>>> >>>> I would appreciate any feedback from the WICG community on scope, >>>> technical direction, and whether this fits the criteria for incubation. >>>> >>>> Best regards, >>>> Dylan Larson >>>> >>>> >>>> Daniel Vinci >>>> em: me@danielvinci.com >>>> mx: @xylobol:amber.tel >>>> >>>>
Received on Sunday, 21 December 2025 02:11:31 UTC