Re: [webai] Web infrastructure footprint of AI systems (#10)

Thank you @FabienGandon, that's a good example! Daniel has also given a talk at FOSDEM titled [Open Source Security in Spite of AI](https://fosdem.org/2026/schedule/event/B7YKQ7-oss-in-spite-of-ai/) which is quite relevant.

One fair point he raises, in addition to the actual slop caused by automatic PRs, is the bandwidth spend: apparently crawlers hit curl's website continuously (>4k requests/second) and the actual tarball downloads are <0.01% of the requests.

Is this the right issue to talk about this? Another notable example is Wikipedia, which has been impacted by [crawlers first](https://diff.wikimedia.org/2025/04/01/how-crawlers-impact-the-operations-of-the-wikimedia-projects/), but is also hit by LLM-powered agents (to the point that some [fetch tools](https://github.com/anthropics/claude-code/issues/29758) are denied access). Wikipedia signed some [deals](https://www.reuters.com/business/retail-consumer/wikipedia-owner-signs-microsoft-meta-ai-content-training-deals-2026-01-15/) to tackle this issue / monetize from this, and something similar will likely happen with other major web resources, but I'd like to understand what happens to the long tail of minor content providers.

When using AI agents, it is also worth noting that the models they run on determine the "defaults" the agent will connect to when looking for information without being explicitly instructed. An example is gpt-oss, which when used with an agent lacking a search tool hits wikipedia multiple times (see e.g. slide 42 of [this presentation](https://www.dropbox.com/scl/fi/i98kft9203zrcdp06159c/20251129-Own-your-AI-agent.pdf?rlkey=92lmtu7r0rfrw0p1dgletsjaz&dl=0)). 

-- 
GitHub Notification of comment by aittalam
Please view or discuss this issue at https://github.com/w3c/webai/issues/10#issuecomment-4024310072 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 9 March 2026 14:46:53 UTC