- From: Edward <edward.in.01101@gmail.com>
- Date: Tue, 23 Dec 2025 01:19:58 +0000
- To: dylan larson <dylanl37@hotmail.com>
- Cc: Hans Petter Blindheim <hans.petter.blindheim@gmail.com>, Warren Parad <wparad@rhosys.ch>, Daniel Vinci <me@danielvinci.com>, Jason Grigsby <jason@cloudfour.com>, g b <bgauryy@gmail.com>, "public-wicg@w3.org" <public-wicg@w3.org>
- Message-ID: <CALjmxcnXwis9jDy8wYRwdGSBbcNaVbP3GzJrwCm65+bb_j361Q@mail.gmail.com>
Complex ideas! I don't have any feedback to this discussion but good luck all, I like the way this is done openly like the design of the Web, through public communications. Thank you, Edward. *(*ℰ.𝒟.𝒥.) On Mon, 22 Dec 2025 at 13:27, dylan larson <dylanl37@hotmail.com> wrote: > Thanks Hans. That’s a fair challenge, and on reflection I think I > overstated one part of the incentive model in my earlier response. > AIDD is not intended to replace crawling, DOM parsing, ranking, or > relevance signals. Those systems are essential, and I agree they provide > value that a single declarative file never could. The problem AIDD is > targeting is much narrower. Domain identity and canonical attribution, not > content relevance. > Today, systems infer who a domain represents indirectly from partial > crawls, third-party sources, and heuristics. This usually works, but it > does break down in edge cases, especially for smaller sites, multi-product > organizations, or domains with uneven or incomplete coverage. These are not > ranking failures, but identity-resolution failures. For AI systems, the > incentive is not reduced crawling or simpler pipelines, but fewer > misattributions and more reliable entity grounding, particularly in cases > where existing signals are sparse or uneven. > > This class of failure is well documented in the entity resolution and > entity linking literature, which shows that correct entity identification > degrades when signals are partial, noisy, or indirect (e.g., Shen *et al.*, > *Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions*, > IEEE TKDE, 2015). > AIDD is intended to act as a domain-controlled anchor signal that can > coexist with crawling-based signals, not replace them. AI systems would > still crawl, still rank, and still detect manipulation, but would have a > predictable, authoritative source for who the domain claims to represent. > On extending existing standards: I agree that adoption is typically > faster, and there would be less adoption friction when building on familiar > surfaces. That said, robots.txt is fundamentally a permission and > crawler-control mechanism and overloading it with identity semantics risks > ambiguity and inconsistent interpretation. > Sitemap.xml is a closer conceptual fit, as it is already designed for > automated consumption and it does support extensibility. One possible path > is to use sitemap.xml as an optional discovery mechanism. An example being > referencing a canonical domain profile via an extension, while keeping the > authoritative representation separate and stable. > I appreciate the pushback. It’s helped clarify the intended scope of the > proposal and how it can better align with existing Web conventions without > conflating concerns. > > ------------------------------ > *From:* Hans Petter Blindheim <hans.petter.blindheim@gmail.com> > *Sent:* Monday, December 22, 2025 2:29 AM > *To:* dylan larson <dylanl37@hotmail.com> > *Cc:* Warren Parad <wparad@rhosys.ch>; Daniel Vinci <me@danielvinci.com>; > Jason Grigsby <jason@cloudfour.com>; g b <bgauryy@gmail.com>; > public-wicg@w3.org <public-wicg@w3.org> > *Subject:* Re: Proposal: AI Domain Data Standard for Authoritative Domain > Identity Metadata > > I do not see the incentive. > > - Quality for AI is being relevant when answering prompts. If AI > consistently present bad results or results in an "uninformed" way, then > the user experiences that comes with interactions will result in people not > using it > - I see no arguments that AIDD as it stands offer an alternative to > this; unless they are willing to take a *massive* drop in quality. > Which is why I do not see the incentive (I do not think any of them would > get behind this) > > As to the current solution of crawling, DOM parsing and heuristics. Yes it > is complex, time consuming, prone to errors, it requires massive storage, > subroutines, energy consumption and more. But it adds signals that you will > not have at scale for AI without it (especially alongside an algorithm that > sorts results). Such as a way to define authority, a way to check for > trust, a way to sort on relevance (on content level) and a way of removing > those who just seek to manipulate/poison. It has also been proven to be the > best "medicine" to reducing hallucinations. > > Really think my suggested approach of seeking to improve robots.txt and > sitemap.xml is a much more feasible approach. It also offers benefits > beyond what AIDD seeks to do. > > > - > > Best > > Hans Petter Blindheim > > > > On Sun, Dec 21, 2025 at 3:40 AM dylan larson <dylanl37@hotmail.com> wrote: > > Thanks for your feedback. > > AI companies would be incentivized to switch to ingest this for a couple > of reasons: > > - Currently, every site on the web describes who and/or what it is > differently. > - The current solution of crawling + DOM parsing + heuristics is complex, > time consuming, and error prone > > That is a lot of time, compute, and resources for an error prone solution.. > > > AIDD is the first standard addressing this current and emerging issue in > an open and easy to implement manner. > > > On Dec 18, 2025, at 2:39 AM, Warren Parad <wparad@rhosys.ch> wrote: > > > Nothing, but they are already doing that. The only difference here is that > this is just a programmatic replacement for the landing page for websites.. > Instead of loading the landing page and reading html, someone wants to lead > the landing page and read json. > > But for this to be valuable, I think we would need convincing that LLMs > are going to magically switch to reading o different page, why would they > do that? > > On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote: > > What's preventing a domain from misrepresenting the products/services it > offers to drive engagement? > > On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote: > > > An example here is this. > > Say AI company/agent/etc. partially scraped a web site and got some pages, > but not all, or third-party l aggregator provided some but not all pages. > > Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a > hardware store providing outdoor equipment, farming equipment, lumber etc.. > > The pages scraped/data provided to the AI/agent were only service area, > home page, partners page, and lawnmower pages. > > When queried about “local hardware/outdoor stores, it may be passed up > because it is believed to be a lawnmower sales store. > > Additionally, an internal to AI/agent algorithm could have ranked/gave > more weight to the partners page and pass one of Bob’s Outdoor Equipment > LLC’s partners contact information due to it being on a more authoritative > page. > > That is why the decision was made to make this a domain level authority. > > On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour.com> wrote: > > > Looking at the examples, how does one sentence of description prevent AI > from misrepresenting a company's many products? Providing a JSON-LD-style > description of an entire domain isn’t going to prevent an AI from saying my > product supports features it doesn’t or providing poor comparisons to > competitors. > > I'd love to hear a real-life example of how this impacted an organization.. > For example, a customer of ACME company asked AI engine X this question: > "[insert here]". The answer it received was "foo." It should have been > "bar." If AIDD was in place, this wouldn't have happened because AI could > use AIDD in this way. > > It's not that I don't think AI making things up about organizations and > their sites isn't a problem. I just have a hard time understanding how this > solves it. > > > > +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs > > > On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> > wrote: > > Thanks for your all’s feedback! This concern/issue had come up in early > discussions during early development. > > AIDD uses the same definition of “authoritative” that existing Web > standards rely on: > > Information is authoritative for the domain that publishes it. > > > Just as with robots.txt, Web App Manifest, OpenGraph metadata, or > schema.org JSON-LD, attackers can copy content but cannot publish it from > the legitimate domain. > > > A phishing domain might mimic the profile of a well-known company, but > that profile would be authoritative only for the phishing domain, but never > for the real one. > > > With that’s said, AIDD is still in its early stages, and AIDD is > deliberately minimal so that additional identity-verification strategies > can evolve independently. Two paths already exist naturally, DNSSEC & > third-party verification. > > > Again thanks, and look forward to further feedback. > > > > On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote: > > > Exactly what I had in mind. > It could be another layer that attackers would try to abuse. > While domain names (and URLs) are deterministic, metadata could be a great > place to manipulate AI models. > Im not sure there's a way to protect models from being manipulated and > also, it would require subdomain rules, which it's another layer that > should be taken into consideration (for data integrity and security). > Last - it should be cached but also need to be able to be purged (for > example when the domain is changing its purpose or owner) > > > On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote: > > I'm concerned that malicious attackers will use this strategy to better > phish users by publishing a domain profile that exactly matches well known > companies. How can we ensure that the information here is actually > trustworthy? > > On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> wrote: > > Hello WICG community, > > > > I would like to introduce the AI Domain Data Standard (AIDD) for > discussion. Its goal is to address a gap in the web ecosystem that is > becoming more visible as AI systems increasingly act as intermediaries > between users and websites. > > > > *Problem* > > AI assistants often misidentify or misrepresent domains because there is > no consistent, machine-readable, domain-controlled source of identity data. > Today, models rely on scraped pages, inconsistent metadata, third-party > aggregators, or outdated indexes. There is no canonical place where a > domain can declare who they are, what they represent, or which resources > are authoritative. > > > > *Proposal* > > AIDD defines a small, predictable JSON document served from: > > • https://<domain>/.well-known/domain-profile.json > • Optional fallback: _ai.<domain> TXT record containing a base64-encoded > JSON copy > > The format contains required identity fields (name, description, website, > contact) and optional schema.org-aligned fields such as entity type, logo, > and JSON-LD. The schema is intentionally minimal to ensure predictable > consumption by AI systems, agents, crawlers, and other automated clients. > > > > *Specification (v0.1.1):* > https://ai-domain-data.org/spec/v0.1 > > > *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json > > > > *Design Principles* > > • Self-hosted and vendor-neutral > • Aligns with schema.org vocabulary > • Minimal surface area with clear versioning > • Follows existing web conventions for .well-known/ > • Supports both HTTPS and DNS TXT discovery > > > > *Early Adoption & Tooling* > > - CLI validator and generator > - Resolver SDK > - Next.js integration > - Jekyll plugin > - WordPress plugin (submitted) > - Online generator and checker tools > > > > *Repository:* > https://github.com/ai-domain-data/spec > <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com> > > > > *Questions for the community* > > 1. Should this pursue formal standardization (W3C, IETF) or remain a > community-driven specification > 2. Are the discovery mechanisms (.well-known + DNS TXT fallback) > appropriate for long-term stability > 3. What extension patterns are advisable while preserving strict > predictability > 4. Should browsers or other user agents eventually consume this data > 5. Are there concerns around naming (domain-profile.json) that the > group would recommend addressing early > > > > *Explainer* > > A more complete explainer is available here: > https://ai-domain-data.org/spec/v0.1 > > I would appreciate any feedback from the WICG community on scope, > technical direction, and whether this fits the criteria for incubation. > > Best regards, > Dylan Larson > > > Daniel Vinci > em: me@danielvinci.com > mx: @xylobol:amber.tel > >
Received on Tuesday, 23 December 2025 01:21:52 UTC