Re: Proposal: AI Domain Data Standard for Authoritative Domain Identity Metadata

forgot the link  threat-modeling/models/Threat Model WellKnown Endpoints.md
at main · TomCJones/threat-modeling
<https://github.com/TomCJones/threat-modeling/blob/main/models/Threat%20Model%20WellKnown%20Endpoints.md>
Peace ..tom jones


On Sat, Dec 20, 2025 at 6:11 PM Tom Jones <thomasclinganjones@gmail.com>
wrote:

> here is a treat model for well-known endpoints that i expect to submit to
> w3c threat modeling group soon.
> Would like to know if it is of help in this discussion.
> feedback welcomed.
> Peace ..tom jones
>
>
> On Sat, Dec 20, 2025 at 6:46 AM Hans Petter Blindheim <
> hans.petter.blindheim@gmail.com> wrote:
>
>> Hi Dylan,
>>
>> Thank you for the explanation. Also found the post and reviewed the
>> suggested standard.
>>
>> As it stands, I am hesitant to endorse this (but to be transparent, I do
>> not think I carry any meaningful weight in that regard). Will go over why I
>> take this stance, because I think it might help others, and it also allows
>> you (or them) to prove my reasoning as wrong.
>>
>> First off, I believe we would get better results by focusing on improving
>> existing standards. Two examples of such improvements:
>>
>>
>>    1. Evolving robots.txt: Updating it to address bots by purpose
>>    instead of handles would give domain owners the control they currently lack
>>    in a bot-landscape that already is impossible to manage manually
>>    2. Enhancing sitemap.xml: Supporting page titles and schema[.]org
>>    markups directly in sitemaps would assist AI and search engines index
>>    domains more accurately (and quickly)
>>
>> These shifts would likely be more effective than the AIDD approach (also
>> llms..txt for that matter, which I find very similar to your suggested
>> standard), providing better support for both major players and new
>> RAG-based competitors.
>>
>> To expand on why I'm hesitant to the AIDD suggested standard:
>>
>>    - AIDD doesn't seem to offer clear benefits beyond what the suggested
>>    llms.txt standard already provides, nor existing adaptation of schema[.]org
>>    does
>>    - This format offers a high risk of poisoning/manipulation. Examples
>>    includes adult content cloaked to appear in AI results for sensitive
>>    queries (like school assignments), or alternative health remedies on
>>    illness related queries
>>    - It would require adoption from both site owners and AI developers,
>>    yet there’s currently no indication of support from major players like
>>    Google, OpenAI, or Anthropic
>>
>> -
>>
>> Best
>>
>> Hans Petter Blindheim
>>
>>
>>
>> On Thu, Dec 18, 2025 at 3:01 PM dylan larson <dylanl37@hotmail.com>
>> wrote:
>>
>>> Thanks for feedback and aolid point and concern you bring up!
>>>
>>> AIDD is not attempting to replace the scope of what registrars perform,
>>> nor is it being totally redundant in that area.
>>>
>>>
>>> Registrar data establishes who owns a domain. AIDD establishes what the
>>> domain claims to represent.
>>>
>>>
>>> AI systems already know how to reason about ownership and abuse. What
>>> they lack is a reliable, first-party declaration of semantic scope at the
>>> domain level.
>>>
>>> On Dec 18, 2025, at 3:38 AM, Hans Petter Blindheim <
>>> hans.petter.blindheim@gmail.com> wrote:
>>>
>>> 
>>> Hi,
>>>
>>> First off, I'm not sure why I landed in this email group. But I have
>>> shown some interest in regulating AI (suggested that the robots.txt
>>> protocol ought to adress bots overall purpose rather than their handles -
>>> with some level of snippet-control).
>>>
>>> Might have misunderstood the intent a bit here, so appologies if I have.
>>> But from my understanding of this, why not put it to the registrars for the
>>> domains instead?
>>> There the domain is either privately held (so, no company entity
>>> applicable) or publicly held (company, government) and tied to official
>>> databases related to that.
>>>
>>> Think this is already broadly in place (whois lookups for a lot of
>>> domains has had this information at least), and if the registrar
>>> information (which should be available for all domains?) then links to the
>>> official company database used on a per-country basis, then their entries
>>> could in turn perhaps be applied towards some standard of business
>>> authority(?).
>>>
>>> -
>>>
>>> Best,
>>>
>>> Hans Petter Blindheim
>>>
>>>
>>>
>>> On Thu, Dec 18, 2025 at 8:42 AM Warren Parad <wparad@rhosys.ch> wrote:
>>>
>>>> Nothing, but they are already doing that. The only difference here is
>>>> that this is just a programmatic replacement for the landing page for
>>>> websites. Instead of loading the landing page and reading html, someone
>>>> wants to lead the landing page and read json.
>>>>
>>>> But for this to be valuable, I think we would need convincing that LLMs
>>>> are going to magically switch to reading o different page, why would they
>>>> do that?
>>>>
>>>> On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote:
>>>>
>>>>> What's preventing a domain from misrepresenting the products/services
>>>>> it offers to drive engagement?
>>>>>
>>>>> On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote:
>>>>>
>>>>>
>>>>> An example here is this.
>>>>>
>>>>> Say AI company/agent/etc. partially scraped a web site and got some
>>>>> pages, but not all, or third-party l aggregator provided some but not all
>>>>> pages.
>>>>>
>>>>> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a
>>>>> hardware store providing outdoor equipment, farming equipment, lumber etc.
>>>>>
>>>>> The pages scraped/data provided to the AI/agent were only service
>>>>> area, home page, partners page, and lawnmower pages.
>>>>>
>>>>> When queried about “local hardware/outdoor stores, it may be passed up
>>>>> because it is believed to be a lawnmower sales store.
>>>>>
>>>>> Additionally, an internal to AI/agent algorithm could have ranked/gave
>>>>> more weight to the partners page and pass one of Bob’s Outdoor
>>>>> Equipment LLC’s partners contact information due to it being on a more
>>>>> authoritative page.
>>>>>
>>>>> That is why the decision was made to make this a domain level
>>>>> authority.
>>>>>
>>>>> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour..com
>>>>> <jason@cloudfour.com>> wrote:
>>>>>
>>>>> 
>>>>> Looking at the examples, how does one sentence of description prevent
>>>>> AI from misrepresenting a company's many products? Providing a
>>>>> JSON-LD-style description of an entire domain isn’t going to prevent an AI
>>>>> from saying my product supports features it doesn’t or providing poor
>>>>> comparisons to competitors.
>>>>>
>>>>> I'd love to hear a real-life example of how this impacted an
>>>>> organization. For example, a customer of ACME company asked AI engine X
>>>>> this question: "[insert here]". The answer it received was "foo.." It
>>>>> should have been "bar." If AIDD was in place, this wouldn't have happened
>>>>> because AI could use AIDD in this way.
>>>>>
>>>>> It's not that I don't think AI making things up about organizations
>>>>> and their sites isn't a problem. I just have a hard time understanding how
>>>>> this solves it.
>>>>>
>>>>>
>>>>>
>>>>> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com |
>>>>> @grigs
>>>>>
>>>>>
>>>>> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com>
>>>>> wrote:
>>>>>
>>>>> Thanks for your all’s feedback! This concern/issue had come up in
>>>>> early discussions during early development.
>>>>>
>>>>> AIDD uses the same definition of “authoritative” that existing Web
>>>>> standards rely on:
>>>>>
>>>>> Information is authoritative for the domain that publishes it.
>>>>>
>>>>>
>>>>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or
>>>>> schema.org JSON-LD, attackers can copy content but cannot publish it
>>>>> from the legitimate domain.
>>>>>
>>>>>
>>>>> A phishing domain might mimic the profile of a well-known company, but
>>>>> that profile would be authoritative only for the phishing domain, but never
>>>>> for the real one.
>>>>>
>>>>>
>>>>> With that’s said, AIDD is still in its early stages, and AIDD is
>>>>> deliberately minimal so that additional identity-verification strategies
>>>>> can evolve independently. Two paths already exist naturally, DNSSEC &
>>>>> third-party verification.
>>>>>
>>>>>
>>>>> Again thanks, and look forward to further feedback.
>>>>>
>>>>>
>>>>>
>>>>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote:
>>>>>
>>>>> 
>>>>> Exactly what I had in mind.
>>>>> It could be another layer that attackers would try to abuse.
>>>>> While domain names (and URLs) are deterministic, metadata could be a
>>>>> great place to manipulate AI models.
>>>>> Im not sure there's a way to protect models from being manipulated and
>>>>> also, it would require subdomain rules, which it's another layer that
>>>>> should be taken into consideration (for data integrity and security).
>>>>> Last - it should be cached but also need to be able to be purged (for
>>>>> example when the domain is changing its purpose or owner)
>>>>>
>>>>>
>>>>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote:
>>>>>
>>>>> I'm concerned that malicious attackers will use this strategy to
>>>>> better phish users by publishing a domain profile that exactly matches well
>>>>> known companies. How can we ensure that the information here is actually
>>>>> trustworthy?
>>>>>
>>>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello WICG community,
>>>>>
>>>>>
>>>>>
>>>>> I would like to introduce the AI Domain Data Standard (AIDD) for
>>>>> discussion. Its goal is to address a gap in the web ecosystem that is
>>>>> becoming more visible as AI systems increasingly act as intermediaries
>>>>> between users and websites.
>>>>>
>>>>>
>>>>>
>>>>> *Problem*
>>>>>
>>>>> AI assistants often misidentify or misrepresent domains because there
>>>>> is no consistent, machine-readable, domain-controlled source of identity
>>>>> data.. Today, models rely on scraped pages, inconsistent metadata,
>>>>> third-party aggregators, or outdated indexes. There is no canonical place
>>>>> where a domain can declare who they are, what they represent, or which
>>>>> resources are authoritative.
>>>>>
>>>>>
>>>>>
>>>>> *Proposal*
>>>>>
>>>>> AIDD defines a small, predictable JSON document served from:
>>>>>
>>>>> • https://<domain>/.well-known/domain-profile.json
>>>>> • Optional fallback: _ai.<domain> TXT record containing a
>>>>> base64-encoded JSON copy
>>>>>
>>>>> The format contains required identity fields (name, description,
>>>>> website, contact) and optional schema.org-aligned fields such as entity
>>>>> type, logo, and JSON-LD. The schema is intentionally minimal to ensure
>>>>> predictable consumption by AI systems, agents, crawlers, and other
>>>>> automated clients.
>>>>>
>>>>>
>>>>>
>>>>> *Specification (v0.1.1):*
>>>>> https://ai-domain-data.org/spec/v0.1
>>>>>
>>>>>
>>>>> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json
>>>>>
>>>>>
>>>>>
>>>>> *Design Principles*
>>>>>
>>>>> • Self-hosted and vendor-neutral
>>>>> • Aligns with schema.org vocabulary
>>>>> • Minimal surface area with clear versioning
>>>>> • Follows existing web conventions for .well-known/
>>>>> • Supports both HTTPS and DNS TXT discovery
>>>>>
>>>>>
>>>>>
>>>>> *Early Adoption & Tooling*
>>>>>
>>>>>    - CLI validator and generator
>>>>>    - Resolver SDK
>>>>>    - Next.js integration
>>>>>    - Jekyll plugin
>>>>>    - WordPress plugin (submitted)
>>>>>    - Online generator and checker tools
>>>>>
>>>>>
>>>>>
>>>>> *Repository:*
>>>>> https://github.com/ai-domain-data/spec
>>>>> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com>
>>>>>
>>>>>
>>>>>
>>>>> *Questions for the community*
>>>>>
>>>>>    1. Should this pursue formal standardization (W3C, IETF) or remain
>>>>>    a community-driven specification
>>>>>    2. Are the discovery mechanisms (.well-known + DNS TXT fallback)
>>>>>    appropriate for long-term stability
>>>>>    3. What extension patterns are advisable while preserving strict
>>>>>    predictability
>>>>>    4. Should browsers or other user agents eventually consume this
>>>>>    data
>>>>>    5. Are there concerns around naming (domain-profile.json) that the
>>>>>    group would recommend addressing early
>>>>>
>>>>>
>>>>>
>>>>> *Explainer*
>>>>>
>>>>> A more complete explainer is available here:
>>>>> https://ai-domain-data.org/spec/v0.1
>>>>>
>>>>> I would appreciate any feedback from the WICG community on scope,
>>>>> technical direction, and whether this fits the criteria for incubation.
>>>>>
>>>>> Best regards,
>>>>> Dylan Larson
>>>>>
>>>>>
>>>>> Daniel Vinci
>>>>> em: me@danielvinci.com
>>>>> mx: @xylobol:amber.tel
>>>>>
>>>>>

Received on Sunday, 21 December 2025 03:20:33 UTC