- From: Hans Petter Blindheim <hans.petter.blindheim@gmail.com>
- Date: Tue, 23 Dec 2025 14:59:13 +0100
- To: Edward <edward.in.01101@gmail.com>
- Cc: dylan larson <dylanl37@hotmail.com>, Warren Parad <wparad@rhosys.ch>, Daniel Vinci <me@danielvinci.com>, Jason Grigsby <jason@cloudfour.com>, g b <bgauryy@gmail.com>, "public-wicg@w3.org" <public-wicg@w3.org>
- Message-ID: <CAC-fYvJWN0HrNzCiCL1vLnvrAkHRmGcDwRDL12HoGRoY=sCiPA@mail.gmail.com>
That scope does fit better, but it is still prone to manipulation/poisoning. Also it takes me back to my initial question: why not work with registrars who in turn can link to a business database of relevance and fix that canonical to their dns record instead? Business registry information should be considered more authoritative (and be helpful to whatever entity-database applied for understanding the canonical of a domain). And while yes, there are rank-and-rents out there, there are also e.g. founders or agencies/that friend who made webpages etc. that holds the domain. But by putting the canonical domain idea to registrars, you would also incentivice a different kind of transparency online (legally), especially if AI also joins this standard and it has benefits (I believe they could be more interested in backing and supporting something like this, and I also think it could be backed by most registrars - who would likely see this as a good way to make fees from legal movements of domains, and they aren't necessarily bad backers to have either). This move (away from domain holders to registrars and business registries) would reduce the poisoning/manipulation risks, and offer a solution that would also be possible to at least locally clean up unwanted players from (or more transparency to hold those who should not operate responsible from). - Best Hans Petter Blindheim On Tue, Dec 23, 2025 at 2:20 AM Edward <edward.in.01101@gmail.com> wrote: > Complex ideas! > > I don't have any feedback to this discussion but good luck all, I like the > way this is done openly like the design of the Web, through public > communications. > > Thank you, > Edward. > *(*ℰ.𝒟.𝒥.) > > On Mon, 22 Dec 2025 at 13:27, dylan larson <dylanl37@hotmail.com> wrote: > >> Thanks Hans. That’s a fair challenge, and on reflection I think I >> overstated one part of the incentive model in my earlier response. >> AIDD is not intended to replace crawling, DOM parsing, ranking, or >> relevance signals. Those systems are essential, and I agree they provide >> value that a single declarative file never could. The problem AIDD is >> targeting is much narrower. Domain identity and canonical attribution, not >> content relevance. >> Today, systems infer who a domain represents indirectly from partial >> crawls, third-party sources, and heuristics. This usually works, but it >> does break down in edge cases, especially for smaller sites, multi-product >> organizations, or domains with uneven or incomplete coverage. These are not >> ranking failures, but identity-resolution failures. For AI systems, the >> incentive is not reduced crawling or simpler pipelines, but fewer >> misattributions and more reliable entity grounding, particularly in cases >> where existing signals are sparse or uneven. >> >> This class of failure is well documented in the entity resolution and >> entity linking literature, which shows that correct entity identification >> degrades when signals are partial, noisy, or indirect (e.g., Shen *et >> al.*, *Entity Linking with a Knowledge Base: Issues, Techniques, and >> Solutions*, IEEE TKDE, 2015). >> AIDD is intended to act as a domain-controlled anchor signal that can >> coexist with crawling-based signals, not replace them. AI systems would >> still crawl, still rank, and still detect manipulation, but would have a >> predictable, authoritative source for who the domain claims to represent. >> On extending existing standards: I agree that adoption is typically >> faster, and there would be less adoption friction when building on familiar >> surfaces. That said, robots.txt is fundamentally a permission and >> crawler-control mechanism and overloading it with identity semantics risks >> ambiguity and inconsistent interpretation. >> Sitemap.xml is a closer conceptual fit, as it is already designed for >> automated consumption and it does support extensibility. One possible path >> is to use sitemap.xml as an optional discovery mechanism. An example being >> referencing a canonical domain profile via an extension, while keeping the >> authoritative representation separate and stable. >> I appreciate the pushback. It’s helped clarify the intended scope of the >> proposal and how it can better align with existing Web conventions without >> conflating concerns. >> >> ------------------------------ >> *From:* Hans Petter Blindheim <hans.petter.blindheim@gmail.com> >> *Sent:* Monday, December 22, 2025 2:29 AM >> *To:* dylan larson <dylanl37@hotmail.com> >> *Cc:* Warren Parad <wparad@rhosys.ch>; Daniel Vinci <me@danielvinci.com>; >> Jason Grigsby <jason@cloudfour.com>; g b <bgauryy@gmail.com>; >> public-wicg@w3.org <public-wicg@w3.org> >> *Subject:* Re: Proposal: AI Domain Data Standard for Authoritative >> Domain Identity Metadata >> >> I do not see the incentive. >> >> - Quality for AI is being relevant when answering prompts. If AI >> consistently present bad results or results in an "uninformed" way, then >> the user experiences that comes with interactions will result in people not >> using it >> - I see no arguments that AIDD as it stands offer an alternative to >> this; unless they are willing to take a *massive* drop in quality. >> Which is why I do not see the incentive (I do not think any of them would >> get behind this) >> >> As to the current solution of crawling, DOM parsing and heuristics. Yes >> it is complex, time consuming, prone to errors, it requires massive >> storage, subroutines, energy consumption and more. But it adds signals that >> you will not have at scale for AI without it (especially alongside an >> algorithm that sorts results). Such as a way to define authority, a way to >> check for trust, a way to sort on relevance (on content level) and a way of >> removing those who just seek to manipulate/poison. It has also been proven >> to be the best "medicine" to reducing hallucinations. >> >> Really think my suggested approach of seeking to improve robots.txt and >> sitemap.xml is a much more feasible approach. It also offers benefits >> beyond what AIDD seeks to do. >> >> >> - >> >> Best >> >> Hans Petter Blindheim >> >> >> >> On Sun, Dec 21, 2025 at 3:40 AM dylan larson <dylanl37@hotmail.com> >> wrote: >> >> Thanks for your feedback. >> >> AI companies would be incentivized to switch to ingest this for a couple >> of reasons: >> >> - Currently, every site on the web describes who and/or what it is >> differently. >> - The current solution of crawling + DOM parsing + heuristics is complex, >> time consuming, and error prone >> >> That is a lot of time, compute, and resources for an error prone >> solution. >> >> >> AIDD is the first standard addressing this current and emerging issue in >> an open and easy to implement manner. >> >> >> On Dec 18, 2025, at 2:39 AM, Warren Parad <wparad@rhosys.ch> wrote: >> >> >> Nothing, but they are already doing that. The only difference here is >> that this is just a programmatic replacement for the landing page for >> websites. Instead of loading the landing page and reading html, someone >> wants to lead the landing page and read json. >> >> But for this to be valuable, I think we would need convincing that LLMs >> are going to magically switch to reading o different page, why would they >> do that? >> >> On Thu, Dec 18, 2025, 04:29 Daniel Vinci <me@danielvinci.com> wrote: >> >> What's preventing a domain from misrepresenting the products/services it >> offers to drive engagement? >> >> On Wed, Dec 17, 2025, at 8:25 PM, dylan larson wrote: >> >> >> An example here is this. >> >> Say AI company/agent/etc. partially scraped a web site and got some >> pages, but not all, or third-party l aggregator provided some but not all >> pages. >> >> Let’s assume this was a website for “Bob’s Outdoor Equipment LLC”, a >> hardware store providing outdoor equipment, farming equipment, lumber etc. >> >> The pages scraped/data provided to the AI/agent were only service area, >> home page, partners page, and lawnmower pages. >> >> When queried about “local hardware/outdoor stores, it may be passed up >> because it is believed to be a lawnmower sales store. >> >> Additionally, an internal to AI/agent algorithm could have ranked/gave >> more weight to the partners page and pass one of Bob’s Outdoor Equipment >> LLC’s partners contact information due to it being on a more authoritative >> page. >> >> That is why the decision was made to make this a domain level authority. >> >> On Dec 9, 2025, at 3:01 PM, Jason Grigsby <jason@cloudfour.com> wrote: >> >> >> Looking at the examples, how does one sentence of description prevent AI >> from misrepresenting a company's many products? Providing a JSON-LD-style >> description of an entire domain isn’t going to prevent an AI from saying my >> product supports features it doesn’t or providing poor comparisons to >> competitors. >> >> I'd love to hear a real-life example of how this impacted an >> organization. For example, a customer of ACME company asked AI engine X >> this question: "[insert here]". The answer it received was "foo." It should >> have been "bar." If AIDD was in place, this wouldn't have happened because >> AI could use AIDD in this way. >> >> It's not that I don't think AI making things up about organizations and >> their sites isn't a problem. I just have a hard time understanding how this >> solves it. >> >> >> >> +1 (503) 290-1090 o | +1 (503) 502-7211 m | http://cloudfour.com | @grigs >> >> >> On Fri, Dec 05, 2025 at 7:57 AM, dylan larson <dylanl37@hotmail.com> >> wrote: >> >> Thanks for your all’s feedback! This concern/issue had come up in early >> discussions during early development. >> >> AIDD uses the same definition of “authoritative” that existing Web >> standards rely on: >> >> Information is authoritative for the domain that publishes it. >> >> >> Just as with robots.txt, Web App Manifest, OpenGraph metadata, or >> schema.org JSON-LD, attackers can copy content but cannot publish it >> from the legitimate domain. >> >> >> A phishing domain might mimic the profile of a well-known company, but >> that profile would be authoritative only for the phishing domain, but never >> for the real one. >> >> >> With that’s said, AIDD is still in its early stages, and AIDD is >> deliberately minimal so that additional identity-verification strategies >> can evolve independently. Two paths already exist naturally, DNSSEC & >> third-party verification. >> >> >> Again thanks, and look forward to further feedback. >> >> >> >> On Dec 5, 2025, at 3:43 AM, g b <bgauryy@gmail.com> wrote: >> >> >> Exactly what I had in mind. >> It could be another layer that attackers would try to abuse. >> While domain names (and URLs) are deterministic, metadata could be a >> great place to manipulate AI models. >> Im not sure there's a way to protect models from being manipulated and >> also, it would require subdomain rules, which it's another layer that >> should be taken into consideration (for data integrity and security). >> Last - it should be cached but also need to be able to be purged (for >> example when the domain is changing its purpose or owner) >> >> >> On Fri, Dec 5, 2025, 10:22 Warren Parad <wparad@rhosys.ch> wrote: >> >> I'm concerned that malicious attackers will use this strategy to better >> phish users by publishing a domain profile that exactly matches well known >> companies. How can we ensure that the information here is actually >> trustworthy? >> >> On Wed, Dec 3, 2025 at 4:23 PM dylan larson <dylanl37@hotmail.com> wrote: >> >> Hello WICG community, >> >> >> >> I would like to introduce the AI Domain Data Standard (AIDD) for >> discussion. Its goal is to address a gap in the web ecosystem that is >> becoming more visible as AI systems increasingly act as intermediaries >> between users and websites. >> >> >> >> *Problem* >> >> AI assistants often misidentify or misrepresent domains because there is >> no consistent, machine-readable, domain-controlled source of identity data. >> Today, models rely on scraped pages, inconsistent metadata, third-party >> aggregators, or outdated indexes. There is no canonical place where a >> domain can declare who they are, what they represent, or which resources >> are authoritative. >> >> >> >> *Proposal* >> >> AIDD defines a small, predictable JSON document served from: >> >> • https://<domain>/.well-known/domain-profile.json >> • Optional fallback: _ai.<domain> TXT record containing a base64-encoded >> JSON copy >> >> The format contains required identity fields (name, description, website, >> contact) and optional schema.org-aligned fields such as entity type, logo, >> and JSON-LD. The schema is intentionally minimal to ensure predictable >> consumption by AI systems, agents, crawlers, and other automated clients. >> >> >> >> *Specification (v0.1.1):* >> https://ai-domain-data.org/spec/v0.1 >> >> >> *Schema: *https://ai-domain-data.org/spec/schema-v0.1.json >> >> >> >> *Design Principles* >> >> • Self-hosted and vendor-neutral >> • Aligns with schema.org vocabulary >> • Minimal surface area with clear versioning >> • Follows existing web conventions for .well-known/ >> • Supports both HTTPS and DNS TXT discovery >> >> >> >> *Early Adoption & Tooling* >> >> - CLI validator and generator >> - Resolver SDK >> - Next.js integration >> - Jekyll plugin >> - WordPress plugin (submitted) >> - Online generator and checker tools >> >> >> >> *Repository:* >> https://github.com/ai-domain-data/spec >> <https://github.com/ai-domain-data/spec?utm_source=chatgpt.com> >> >> >> >> *Questions for the community* >> >> 1. Should this pursue formal standardization (W3C, IETF) or remain a >> community-driven specification >> 2. Are the discovery mechanisms (.well-known + DNS TXT fallback) >> appropriate for long-term stability >> 3. What extension patterns are advisable while preserving strict >> predictability >> 4. Should browsers or other user agents eventually consume this data >> 5. Are there concerns around naming (domain-profile.json) that the >> group would recommend addressing early >> >> >> >> *Explainer* >> >> A more complete explainer is available here: >> https://ai-domain-data.org/spec/v0.1 >> >> I would appreciate any feedback from the WICG community on scope, >> technical direction, and whether this fits the criteria for incubation. >> >> Best regards, >> Dylan Larson >> >> >> Daniel Vinci >> em: me@danielvinci.com >> mx: @xylobol:amber.tel >> >>
Received on Tuesday, 23 December 2025 13:59:30 UTC