- From: Daniel Vinci <me@danielvinci.com>
- Date: Wed, 17 Dec 2025 20:29:00 -0700
- To: "dylan larson" <dylanl37@hotmail.com>, "Jason Grigsby" <jason@cloudfour.com>
- Cc: "g b" <bgauryy@gmail.com>, "Warren Parad" <wparad@rhosys.ch>, "public-wicg@w3.org" <public-wicg@w3.org>
- Message-Id: <b19f08f9-f139-4c32-a048-8accc5a3acf4@app.fastmail.com>
What's preventing a domain from misrepresenting the products/services it offers to drive engagement? On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote: > > An example here is this. > > Say AI company/agent/etc. partially scraped a web site and got some pages, but not all, or third-party l aggregator provided some but not all pages. > > Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a hardware store providing outdoor equipment, farming equipment, lumber etc. > > The pages scraped/data provided to the AI/agent were only service area, home page, partners page, and lawnmower pages. > > When queried about “local hardware/outdoor stores, it may be passed up because it is believed to be a lawnmower sales store. > > Additionally, an internal to AI/agent algorithm could have ranked/gave more weight to the partners page and pass one of Bob’s Outdoor Equipment LLC’s partners contact information due to it being on a more authoritative page. > > That is why the decision was made to make this a domain level authority. > >> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour.com> wrote: >> >> Looking at the examples, how does one sentence of description prevent AI from misrepresenting a company's many products? Providing a JSON-LD-style description of an entire domain isn’t going to prevent an AI from saying my product supports features it doesn’t or providing poor comparisons to competitors. >> >> I'd love to hear a real-life example of how this impacted an organization. For example, a customer of ACME company asked AI engine X this question: "[insert here]". The answer it received was "foo." It should have been "bar." If AIDD was in place, this wouldn't have happened because AI could use AIDD in this way. >> >> It's not that I don't think AI making things up about organizations and their sites isn't a problem. I just have a hard time understanding how this solves it. >> >> >> >> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs >> >> >> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> wrote: >>> Thanks for your all’s feedback! This concern/issue had come up in early discussions during early development. >>> >>> AIDD uses the same definition of “authoritative” that existing Web standards rely on: >>> Information is authoritative for the domain that publishes it. >>> >>> >>> >>> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or schema.org JSON-LD, attackers can copy content but cannot publish it from the legitimate domain. >>> >>> >>> >>> A phishing domain might mimic the profile of a well-known company, but that profile would be authoritative only for the phishing domain, but never for the real one. >>> >>> >>> >>> With that’s said, AIDD is still in its early stages, and AIDD is deliberately minimal so that additional identity-verification strategies can evolve independently. Two paths already exist naturally, DNSSEC & third-party verification. >>> >>> >>> >>> Again thanks, and look forward to further feedback. >>> >>> >>> >>> >>> >>>> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote: >>>> >>>> Exactly what I had in mind. >>>> It could be another layer that attackers would try to abuse. >>>> While domain names (and URLs) are deterministic, metadata could be a great place to manipulate AI models. >>>> Im not sure there's a way to protect models from being manipulated and also, it would require subdomain rules, which it's another layer that should be taken into consideration (for data integrity and security). >>>> Last - it should be cached but also need to be able to be purged (for example when the domain is changing its purpose or owner) >>>> >>>> >>>> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote: >>>>> I'm concerned that malicious attackers will use this strategy to better phish users by publishing a domain profile that exactly matches well known companies. How can we ensure that the information here is actually trustworthy? >>>>> >>>>> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> wrote: >>>>>> Hello WICG community,____ >>>>>> __ __ >>>>>> I would like to introduce the AI Domain Data Standard (AIDD) for discussion. Its goal is to address a gap in the web ecosystem that is becoming more visible as AI systems increasingly act as intermediaries between users and websites.____ >>>>>> __ __ >>>>>> *Problem*____ >>>>>> AI assistants often misidentify or misrepresent domains because there is no consistent, machine-readable, domain-controlled source of identity data. Today, models rely on scraped pages, inconsistent metadata, third-party aggregators, or outdated indexes. There is no canonical place where a domain can declare who they are, what they represent, or which resources are authoritative.____ >>>>>> __ __ >>>>>> *Proposal*____ >>>>>> AIDD defines a small, predictable JSON document served from:____ >>>>>> • https://<domain>/.well-known/domain-profile.json >>>>>> • Optional fallback: _ai.<domain> TXT record containing a base64-encoded JSON copy____ >>>>>> The format contains required identity fields (name, description, website, contact) and optional schema.org-aligned fields such as entity type, logo, and JSON-LD. The schema is intentionally minimal to ensure predictable consumption by AI systems, agents, crawlers, and other automated clients.____ >>>>>> __ __ >>>>>> *Specification (v0.1.1):* >>>>>> https://ai-domain-data.org/spec/v0.1____ >>>>>> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json____ >>>>>> *__ __* >>>>>> *Design Principles*____ >>>>>> • Self-hosted and vendor-neutral >>>>>> • Aligns with schema.org vocabulary >>>>>> • Minimal surface area with clear versioning >>>>>> • Follows existing web conventions for .well-known/ >>>>>> • Supports both HTTPS and DNS TXT discovery____ >>>>>> *__ __* >>>>>> *Early Adoption & Tooling*____ >>>>>> • CLI validator and generator____ >>>>>> • Resolver SDK____ >>>>>> • Next.js integration____ >>>>>> • Jekyll plugin____ >>>>>> • WordPress plugin (submitted)____ >>>>>> • Online generator and checker tools____ >>>>>> __ __ >>>>>> *Repository:* >>>>>> https://github.com/ai-domain-data/spec <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com>____ >>>>>> *__ __* >>>>>> *Questions for the community*____ >>>>>> 1. Should this pursue formal standardization (W3C, IETF) or remain a community-driven specification____ >>>>>> 2. Are the discovery mechanisms (.well-known + DNS TXT fallback) appropriate for long-term stability____ >>>>>> 3. What extension patterns are advisable while preserving strict predictability____ >>>>>> 4. Should browsers or other user agents eventually consume this data____ >>>>>> 5. Are there concerns around naming (domain-profile.json) that the group would recommend addressing early____ >>>>>> *__ __* >>>>>> *Explainer*____ >>>>>> A more complete explainer is available here: >>>>>> https://ai-domain-data.org/spec/v0.1____ >>>>>> I would appreciate any feedback from the WICG community on scope, technical direction, and whether this fits the criteria for incubation.____ >>>>>> Best regards, >>>>>> Dylan Larson Daniel Vinci em: me@danielvinci.com mx: @xylobol:amber.tel
Received on Thursday, 18 December 2025 03:32:57 UTC