Re: Proposal: AI Domain Data Standard for Authoritative Domain Identity Metadata from Edward on 2025-12-23 (public-wicg@w3.org from December 2025)

From: Edward <edward.in.01101@gmail.com>
Date: Tue, 23 Dec 2025 01:19:58 +0000
To: dylan larson <dylanl37@hotmail.com>
Cc: Hans Petter Blindheim <hans.petter.blindheim@gmail.com>, Warren Parad <wparad@rhosys.ch>, Daniel Vinci <me@danielvinci.com>, Jason Grigsby <jason@cloudfour.com>, g b <bgauryy@gmail.com>, "public-wicg@w3.org" <public-wicg@w3.org>
Message-ID: <CALjmxcnXwis9jDy8wYRwdGSBbcNaVbP3GzJrwCm65+bb_j361Q@mail.gmail.com>
Complex ideas!

I don't have any feedback to this discussion but good luck all, I like the
way this is done openly like the design of the Web, through public
communications.

Thank you,
Edward.
*(*ℰ.𝒟.𝒥.)

On Mon, 22 Dec 2025 at 13:27, dylan larson <dylanl37@hotmail.com> wrote:

> Thanks Hans. That’s a fair challenge, and on reflection I think I
> overstated one part of the incentive model in my earlier response.
> AIDD is not intended to replace crawling, DOM parsing, ranking, or
> relevance signals. Those systems are essential, and I agree they provide
> value that a single declarative file never could. The problem AIDD is
> targeting is much narrower. Domain identity and canonical attribution, not
> content relevance.
> Today, systems infer who a domain represents indirectly from partial
> crawls, third-party sources, and heuristics. This usually works, but it
> does break down in edge cases, especially for smaller sites, multi-product
> organizations, or domains with uneven or incomplete coverage. These are not
> ranking failures, but identity-resolution failures. For AI systems, the
> incentive is not reduced crawling or simpler pipelines, but fewer
> misattributions and more reliable entity grounding, particularly in cases
> where existing signals are sparse or uneven.
>
> This class of failure is well documented in the entity resolution and
> entity linking literature, which shows that correct entity identification
> degrades when signals are partial, noisy, or indirect (e.g., Shen *et al.*,
> *Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions*,
> IEEE TKDE, 2015).
> AIDD is intended to act as a domain-controlled anchor signal that can
> coexist with crawling-based signals, not replace them. AI systems would
> still crawl, still rank, and still detect manipulation, but would have a
> predictable, authoritative source for who the domain claims to represent.
> On extending existing standards: I agree that adoption is typically
> faster, and there would be less adoption friction when building on familiar
> surfaces. That said, robots.txt is fundamentally a permission and
> crawler-control mechanism and overloading it with identity semantics risks
> ambiguity and inconsistent interpretation.
> Sitemap.xml is a closer conceptual fit, as it is already designed for
> automated consumption and it does support extensibility. One possible path
> is to use sitemap.xml as an optional discovery mechanism. An example being
> referencing a canonical domain profile via an extension, while keeping the
> authoritative representation separate and stable.
> I appreciate the pushback. It’s helped clarify the intended scope of the
> proposal and how it can better align with existing Web conventions without
> conflating concerns.
>
> ------------------------------
> *From:* Hans Petter Blindheim <hans.petter.blindheim@gmail.com>
> *Sent:* Monday, December 22, 2025 2:29 AM
> *To:* dylan larson <dylanl37@hotmail.com>
> *Cc:* Warren Parad <wparad@rhosys.ch>; Daniel Vinci <me@danielvinci.com>;
> Jason Grigsby <jason@cloudfour.com>; g b <bgauryy@gmail.com>;
> public-wicg@w3.org <public-wicg@w3.org>
> *Subject:* Re: Proposal: AI Domain Data Standard for Authoritative Domain
> Identity Metadata
>
> I do not see the incentive.
>
>    - Quality for AI is being relevant when answering prompts. If AI
>    consistently present bad results or results in an "uninformed" way, then
>    the user experiences that comes with interactions will result in people not
>    using it
>    - I see no arguments that AIDD as it stands offer an alternative to
>    this; unless they are willing to take a *massive* drop in quality.
>    Which is why I do not see the incentive (I do not think any of them would
>    get behind this)
>
> As to the current solution of crawling, DOM parsing and heuristics. Yes it
> is complex, time consuming, prone to errors, it requires massive storage,
> subroutines, energy consumption and more. But it adds signals that you will
> not have at scale for AI without it (especially alongside an algorithm that
> sorts results). Such as a way to define authority, a way to check for
> trust, a way to sort on relevance (on content level) and a way of removing
> those who just seek to manipulate/poison. It has also been proven to be the
> best "medicine" to reducing hallucinations.
>
> Really think my suggested approach of seeking to improve robots.txt and
> sitemap.xml is a much more feasible approach. It also offers benefits
> beyond what AIDD seeks to do.
>
>
> -
>
> Best
>
> Hans Petter Blindheim
>
>
>
> On Sun, Dec 21, 2025 at 3:40 AM dylan larson <dylanl37@hotmail.com> wrote:
>
> Thanks for your feedback.
>
> AI companies would be incentivized to switch to ingest this for a couple
> of reasons:
>
> - Currently, every site on the web describes who and/or what it is
> differently.
> - The current solution of crawling + DOM parsing + heuristics is complex,
> time consuming, and error prone
>
> That is a lot of time, compute, and resources for an error prone solution..
>
>
> AIDD is the first standard addressing this current and emerging issue in
> an open and easy to implement manner.
>
>
> On Dec 18, 2025, at 2:39 AM, Warren Parad <wparad@rhosys.ch> wrote:
>
> 
> Nothing, but they are already doing that. The only difference here is that
> this is just a programmatic replacement for the landing page for websites..
> Instead of loading the landing page and reading html, someone wants to lead
> the landing page and read json.
>
> But for this to be valuable, I think we would need convincing that LLMs
> are going to magically switch to reading o different page, why would they
> do that?
>
> On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote:
>
> What's preventing a domain from misrepresenting the products/services it
> offers to drive engagement?
>
> On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote:
>
>
> An example here is this.
>
> Say AI company/agent/etc. partially scraped a web site and got some pages,
> but not all, or third-party l aggregator provided some but not all pages.
>
> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a
> hardware store providing outdoor equipment, farming equipment, lumber etc..
>
> The pages scraped/data provided to the AI/agent were only service area,
> home page, partners page, and lawnmower pages.
>
> When queried about “local hardware/outdoor stores, it may be passed up
> because it is believed to be a lawnmower sales store.
>
> Additionally, an internal to AI/agent algorithm could have ranked/gave
> more weight to the partners page and pass one of Bob’s Outdoor Equipment
> LLC’s partners contact information due to it being on a more authoritative
> page.
>
> That is why the decision was made to make this a domain level authority.
>
> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour.com> wrote:
>
> 
> Looking at the examples, how does one sentence of description prevent AI
> from misrepresenting a company's many products? Providing a JSON-LD-style
> description of an entire domain isn’t going to prevent an AI from saying my
> product supports features it doesn’t or providing poor comparisons to
> competitors.
>
> I'd love to hear a real-life example of how this impacted an organization..
> For example, a customer of ACME company asked AI engine X this question:
> "[insert here]". The answer it received was "foo." It should have been
> "bar." If AIDD was in place, this wouldn't have happened because AI could
> use AIDD in this way.
>
> It's not that I don't think AI making things up about organizations and
> their sites isn't a problem. I just have a hard time understanding how this
> solves it.
>
>
>
> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs
>
>
> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com>
> wrote:
>
> Thanks for your all’s feedback! This concern/issue had come up in early
> discussions during early development.
>
> AIDD uses the same definition of “authoritative” that existing Web
> standards rely on:
>
> Information is authoritative for the domain that publishes it.
>
>
> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or
> schema.org JSON-LD, attackers can copy content but cannot publish it from
> the legitimate domain.
>
>
> A phishing domain might mimic the profile of a well-known company, but
> that profile would be authoritative only for the phishing domain, but never
> for the real one.
>
>
> With that’s said, AIDD is still in its early stages, and AIDD is
> deliberately minimal so that additional identity-verification strategies
> can evolve independently. Two paths already exist naturally, DNSSEC &
> third-party verification.
>
>
> Again thanks, and look forward to further feedback.
>
>
>
> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote:
>
> 
> Exactly what I had in mind.
> It could be another layer that attackers would try to abuse.
> While domain names (and URLs) are deterministic, metadata could be a great
> place to manipulate AI models.
> Im not sure there's a way to protect models from being manipulated and
> also, it would require subdomain rules, which it's another layer that
> should be taken into consideration (for data integrity and security).
> Last - it should be cached but also need to be able to be purged (for
> example when the domain is changing its purpose or owner)
>
>
> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote:
>
> I'm concerned that malicious attackers will use this strategy to better
> phish users by publishing a domain profile that exactly matches well known
> companies. How can we ensure that the information here is actually
> trustworthy?
>
> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> wrote:
>
> Hello WICG community,
>
>
>
> I would like to introduce the AI Domain Data Standard (AIDD) for
> discussion. Its goal is to address a gap in the web ecosystem that is
> becoming more visible as AI systems increasingly act as intermediaries
> between users and websites.
>
>
>
> *Problem*
>
> AI assistants often misidentify or misrepresent domains because there is
> no consistent, machine-readable, domain-controlled source of identity data.
> Today, models rely on scraped pages, inconsistent metadata, third-party
> aggregators, or outdated indexes. There is no canonical place where a
> domain can declare who they are, what they represent, or which resources
> are authoritative.
>
>
>
> *Proposal*
>
> AIDD defines a small, predictable JSON document served from:
>
> • https://<domain>/.well-known/domain-profile.json
> • Optional fallback: _ai.<domain> TXT record containing a base64-encoded
> JSON copy
>
> The format contains required identity fields (name, description, website,
> contact) and optional schema.org-aligned fields such as entity type, logo,
> and JSON-LD. The schema is intentionally minimal to ensure predictable
> consumption by AI systems, agents, crawlers, and other automated clients.
>
>
>
> *Specification (v0.1.1):*
> https://ai-domain-data.org/spec/v0.1
>
>
> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json
>
>
>
> *Design Principles*
>
> • Self-hosted and vendor-neutral
> • Aligns with schema.org vocabulary
> • Minimal surface area with clear versioning
> • Follows existing web conventions for .well-known/
> • Supports both HTTPS and DNS TXT discovery
>
>
>
> *Early Adoption & Tooling*
>
>    - CLI validator and generator
>    - Resolver SDK
>    - Next.js integration
>    - Jekyll plugin
>    - WordPress plugin (submitted)
>    - Online generator and checker tools
>
>
>
> *Repository:*
> https://github.com/ai-domain-data/spec
> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com>
>
>
>
> *Questions for the community*
>
>    1. Should this pursue formal standardization (W3C, IETF) or remain a
>    community-driven specification
>    2. Are the discovery mechanisms (.well-known + DNS TXT fallback)
>    appropriate for long-term stability
>    3. What extension patterns are advisable while preserving strict
>    predictability
>    4. Should browsers or other user agents eventually consume this data
>    5. Are there concerns around naming (domain-profile.json) that the
>    group would recommend addressing early
>
>
>
> *Explainer*
>
> A more complete explainer is available here:
> https://ai-domain-data.org/spec/v0.1
>
> I would appreciate any feedback from the WICG community on scope,
> technical direction, and whether this fits the criteria for incubation.
>
> Best regards,
> Dylan Larson
>
>
> Daniel Vinci
> em: me@danielvinci.com
> mx: @xylobol:amber.tel
>
>
Received on Tuesday, 23 December 2025 01:21:52 UTC