Re: Proposal: AI Domain Data Standard for Authoritative Domain Identity Metadata from Warren Parad on 2025-12-09 (public-wicg@w3.org from December 2025)

From: Warren Parad <wparad@rhosys.ch>
Date: Tue, 9 Dec 2025 21:08:50 +0100
To: Jason Grigsby <jason@cloudfour.com>
Cc: dylan larson <dylanl37@hotmail.com>, g b <bgauryy@gmail.com>, public-wicg@w3.org
Message-ID: <CAJot-L3H48hU5vwb9EG4qZfEPLLJp3yAdcWEKucxZW6OHJP-cA@mail.gmail.com>
I think one could say, "Hey wouldn't it be better to have the <script
type="application/ld+json>{}</script> available at a well-known location
instead of having to parse the html of the main page?"

For which I would say: "Sure!"

That sounds like it should be called *Domain Linked Data*, to be
consistent, AI has nothing to do with this for me.

But I'm also very much with Jason. If companies training LLMs can't be
bothered to pull metadata out of the canonical location for this, why do we
think they would be bothered to use a well-known property which purports to
doing the exact same thing?

On Tue, Dec 9, 2025 at 9:01 PM Jason Grigsby <jason@cloudfour.com> wrote:

> Looking at the examples, how does one sentence of description prevent AI
> from misrepresenting a company's many products? Providing a JSON-LD-style
> description of an entire domain isn’t going to prevent an AI from saying my
> product supports features it doesn’t or providing poor comparisons to
> competitors.
>
> I'd love to hear a real-life example of how this impacted an organization.
> For example, a customer of ACME company asked AI engine X this question:
> "[insert here]". The answer it received was "foo." It should have been
> "bar." If AIDD was in place, this wouldn't have happened because AI could
> use AIDD in this way.
>
> It's not that I don't think AI making things up about organizations and
> their sites isn't a problem. I just have a hard time understanding how this
> solves it.
>
>
> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs
>
>
> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com>
> wrote:
>
>> Thanks for your all’s feedback! This concern/issue had come up in early
>> discussions during early development.
>>
>> AIDD uses the same definition of “authoritative” that existing Web
>> standards rely on:
>>
>> Information is authoritative for the domain that publishes it.
>>
>>
>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or
>> schema.org JSON-LD, attackers can copy content but cannot publish it
>> from the legitimate domain.
>>
>>
>> A phishing domain might mimic the profile of a well-known company, but
>> that profile would be authoritative only for the phishing domain, but never
>> for the real one.
>>
>>
>> With that’s said, AIDD is still in its early stages, and AIDD is
>> deliberately minimal so that additional identity-verification strategies
>> can evolve independently. Two paths already exist naturally, DNSSEC &
>> third-party verification.
>>
>>
>> Again thanks, and look forward to further feedback.
>>
>>
>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote:
>>
>> 
>> Exactly what I had in mind.
>> It could be another layer that attackers would try to abuse.
>> While domain names (and URLs) are deterministic, metadata could be a
>> great place to manipulate AI models.
>> Im not sure there's a way to protect models from being manipulated and
>> also, it would require subdomain rules, which it's another layer that
>> should be taken into consideration (for data integrity and security).
>> Last - it should be cached but also need to be able to be purged (for
>> example when the domain is changing its purpose or owner)
>>
>>
>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote:
>>
>>> I'm concerned that malicious attackers will use this strategy to better
>>> phish users by publishing a domain profile that exactly matches well known
>>> companies. How can we ensure that the information here is actually
>>> trustworthy?
>>>
>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com>
>>> wrote:
>>>
>>>> Hello WICG community,
>>>>
>>>>
>>>>
>>>> I would like to introduce the AI Domain Data Standard (AIDD) for
>>>> discussion. Its goal is to address a gap in the web ecosystem that is
>>>> becoming more visible as AI systems increasingly act as intermediaries
>>>> between users and websites.
>>>>
>>>>
>>>>
>>>> *Problem*
>>>>
>>>> AI assistants often misidentify or misrepresent domains because there
>>>> is no consistent, machine-readable, domain-controlled source of identity
>>>> data. Today, models rely on scraped pages, inconsistent metadata,
>>>> third-party aggregators, or outdated indexes. There is no canonical place
>>>> where a domain can declare who they are, what they represent, or which
>>>> resources are authoritative.
>>>>
>>>>
>>>>
>>>> *Proposal*
>>>>
>>>> AIDD defines a small, predictable JSON document served from:
>>>>
>>>> • https://<domain>/.well-known/domain-profile.json
>>>> • Optional fallback: _ai.<domain> TXT record containing a
>>>> base64-encoded JSON copy
>>>>
>>>> The format contains required identity fields (name, description,
>>>> website, contact) and optional schema.org-aligned fields such as entity
>>>> type, logo, and JSON-LD. The schema is intentionally minimal to ensure
>>>> predictable consumption by AI systems, agents, crawlers, and other
>>>> automated clients.
>>>>
>>>>
>>>>
>>>> *Specification (v0.1.1):*
>>>> https://ai-domain-data.org/spec/v0.1
>>>>
>>>>
>>>> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json
>>>>
>>>>
>>>>
>>>> *Design Principles*
>>>>
>>>> • Self-hosted and vendor-neutral
>>>> • Aligns with schema.org vocabulary
>>>> • Minimal surface area with clear versioning
>>>> • Follows existing web conventions for .well-known/
>>>> • Supports both HTTPS and DNS TXT discovery
>>>>
>>>>
>>>>
>>>> *Early Adoption & Tooling*
>>>>
>>>>    - CLI validator and generator
>>>>    - Resolver SDK
>>>>    - Next.js integration
>>>>    - Jekyll plugin
>>>>    - WordPress plugin (submitted)
>>>>    - Online generator and checker tools
>>>>
>>>>
>>>>
>>>> *Repository:*
>>>> https://github.com/ai-domain-data/spec
>>>> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com>
>>>>
>>>>
>>>>
>>>> *Questions for the community*
>>>>
>>>>    1. Should this pursue formal standardization (W3C, IETF) or remain
>>>>    a community-driven specification
>>>>    2. Are the discovery mechanisms (.well-known + DNS TXT fallback)
>>>>    appropriate for long-term stability
>>>>    3. What extension patterns are advisable while preserving strict
>>>>    predictability
>>>>    4. Should browsers or other user agents eventually consume this data
>>>>    5. Are there concerns around naming (domain-profile.json) that the
>>>>    group would recommend addressing early
>>>>
>>>>
>>>>
>>>> *Explainer*
>>>>
>>>> A more complete explainer is available here:
>>>> https://ai-domain-data.org/spec/v0.1
>>>>
>>>> I would appreciate any feedback from the WICG community on scope,
>>>> technical direction, and whether this fits the criteria for incubation.
>>>>
>>>> Best regards,
>>>> Dylan Larson
>>>>
>>>
>
Received on Tuesday, 9 December 2025 20:09:06 UTC