Re: Proposal: AI Domain Data Standard for Authoritative Domain Identity Metadata from Daniel Vinci on 2025-12-18 (public-wicg@w3.org from December 2025)

From: Daniel Vinci <me@danielvinci.com>
Date: Wed, 17 Dec 2025 20:29:00 -0700
To: "dylan larson" <dylanl37@hotmail.com>, "Jason Grigsby" <jason@cloudfour.com>
Cc: "g b" <bgauryy@gmail.com>, "Warren Parad" <wparad@rhosys.ch>, "public-wicg@w3.org" <public-wicg@w3.org>
Message-Id: <b19f08f9-f139-4c32-a048-8accc5a3acf4@app.fastmail.com>
What's preventing a domain from misrepresenting the products/services it offers to drive engagement?

On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote:
> 
> An example here is this. 
> 
> Say AI company/agent/etc. partially scraped a web site and got some pages, but not all, or third-party l aggregator provided some but not all pages. 
> 
> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a hardware store providing outdoor equipment, farming equipment, lumber etc. 
> 
> The pages scraped/data provided to the AI/agent were only service area, home page, partners page, and lawnmower pages. 
> 
> When queried about “local hardware/outdoor stores, it may be passed up because it is believed to be a lawnmower sales store.
> 
> Additionally, an internal to AI/agent algorithm could have ranked/gave more weight to the partners page and pass one of Bob’s Outdoor Equipment LLC’s partners contact information due to it being on a more authoritative page. 
> 
> That is why the decision was made to make this a domain level authority. 
> 
>> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour.com> wrote:
>> 
>> Looking at the examples, how does one sentence of description prevent AI from misrepresenting a company's many products? Providing a JSON-LD-style description of an entire domain isn’t going to prevent an AI from saying my product supports features it doesn’t or providing poor comparisons to competitors.
>> 
>> I'd love to hear a real-life example of how this impacted an organization. For example, a customer of ACME company asked AI engine X this question: "[insert here]". The answer it received was "foo." It should have been "bar." If AIDD was in place, this wouldn't have happened because AI could use AIDD in this way.
>> 
>> It's not that I don't think AI making things up about organizations and their sites isn't a problem. I just have a hard time understanding how this solves it. 
>> 
>> 
>> 
>> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs
>> 
>> 
>> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> wrote:
>>> Thanks for your all’s feedback! This concern/issue had come up in early discussions during early development.
>>> 
>>> AIDD uses the same definition of “authoritative” that existing Web standards rely on:
>>> Information is authoritative for the domain that publishes it. 
>>> 
>>> 
>>> 
>>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or schema.org JSON-LD, attackers can copy content but cannot publish it from the legitimate domain.
>>> 
>>> 
>>> 
>>> A phishing domain might mimic the profile of a well-known company, but that profile would be authoritative only for the phishing domain, but never for the real one.
>>> 
>>> 
>>> 
>>> With that’s said, AIDD is still in its early stages, and AIDD is deliberately minimal so that additional identity-verification strategies can evolve independently. Two paths already exist naturally, DNSSEC & third-party verification. 
>>> 
>>> 
>>> 
>>> Again thanks, and look forward to further feedback.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote:
>>>> 
>>>> Exactly what I had in mind.
>>>> It could be another layer that attackers would try to abuse.
>>>> While domain names (and URLs) are deterministic, metadata could be a great place to manipulate AI models.
>>>> Im not sure there's a way to protect models from being manipulated and also, it would require subdomain rules, which it's another layer that should be taken into consideration (for data integrity and security).
>>>> Last - it should be cached but also need to be able to be purged (for example when the domain is changing its purpose or owner)
>>>> 
>>>> 
>>>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote:
>>>>> I'm concerned that malicious attackers will use this strategy to better phish users by publishing a domain profile that exactly matches well known companies. How can we ensure that the information here is actually trustworthy?
>>>>> 
>>>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> wrote:
>>>>>> Hello WICG community,____
>>>>>> __ __
>>>>>> I would like to introduce the AI Domain Data Standard (AIDD) for discussion. Its goal is to address a gap in the web ecosystem that is becoming more visible as AI systems increasingly act as intermediaries between users and websites.____
>>>>>> __ __
>>>>>> *Problem*____
>>>>>> AI assistants often misidentify or misrepresent domains because there is no consistent, machine-readable, domain-controlled source of identity data. Today, models rely on scraped pages, inconsistent metadata, third-party aggregators, or outdated indexes. There is no canonical place where a domain can declare who they are, what they represent, or which resources are authoritative.____
>>>>>> __ __
>>>>>> *Proposal*____
>>>>>> AIDD defines a small, predictable JSON document served from:____
>>>>>> • https://<domain>/.well-known/domain-profile.json
>>>>>> • Optional fallback: _ai.<domain> TXT record containing a base64-encoded JSON copy____
>>>>>> The format contains required identity fields (name, description, website, contact) and optional schema.org-aligned fields such as entity type, logo, and JSON-LD. The schema is intentionally minimal to ensure predictable consumption by AI systems, agents, crawlers, and other automated clients.____
>>>>>> __ __
>>>>>> *Specification (v0.1.1):*
>>>>>> https://ai-domain-data.org/spec/v0.1____
>>>>>> *Schema:
*https://ai-domain-data.org/spec/schema-v0.1.json____
>>>>>> *__ __*
>>>>>> *Design Principles*____
>>>>>> • Self-hosted and vendor-neutral
>>>>>> • Aligns with schema.org vocabulary
>>>>>> • Minimal surface area with clear versioning
>>>>>> • Follows existing web conventions for .well-known/
>>>>>> • Supports both HTTPS and DNS TXT discovery____
>>>>>> *__ __*
>>>>>> *Early Adoption & Tooling*____
>>>>>>  • CLI validator and generator____
>>>>>>  • Resolver SDK____
>>>>>>  • Next.js integration____
>>>>>>  • Jekyll plugin____
>>>>>>  • WordPress plugin (submitted)____
>>>>>>  • Online generator and checker tools____
>>>>>> __ __
>>>>>> *Repository:*
>>>>>> https://github.com/ai-domain-data/spec <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com>____
>>>>>> *__ __*
>>>>>> *Questions for the community*____
>>>>>>  1. Should this pursue formal standardization (W3C, IETF) or remain a community-driven specification____
>>>>>>  2. Are the discovery mechanisms (.well-known + DNS TXT fallback) appropriate for long-term stability____
>>>>>>  3. What extension patterns are advisable while preserving strict predictability____
>>>>>>  4. Should browsers or other user agents eventually consume this data____
>>>>>>  5. Are there concerns around naming (domain-profile.json) that the group would recommend addressing early____
>>>>>> *__ __*
>>>>>> *Explainer*____
>>>>>> A more complete explainer is available here:
>>>>>> https://ai-domain-data.org/spec/v0.1____
>>>>>> I would appreciate any feedback from the WICG community on scope, technical direction, and whether this fits the criteria for incubation.____
>>>>>> Best regards,
>>>>>> Dylan Larson

Daniel Vinci
em: me@danielvinci.com
mx: @xylobol:amber.tel
Received on Thursday, 18 December 2025 03:32:57 UTC