Web Crawler Identification via HTTP Message Signatures

Hi HTTP,

I’d like to inform you of two drafts being discussed on web-bot-auth mailing [0], and relevant to the group. They build on RFC 9421 HTTP Message Signatures [1] to identify bot traffic.

* draft-meunier-http-message-signatures-directory-00 [2] – Proposes a well-known endpoint for sharing key material used in HTTP Message Signatures, and introduces a new Signature-Agent header to support key directory discovery.

* draft-meunier-web-bot-auth-architecture-01 [3] – Describes a framework for authenticating crawlers using the above mechanism in conjunction with HTTP Message Signatures.

We’ve also built a live demo [4] to show that these drafts can be implemented with minimal changes to existing systems. All code is available on GitHub [5].

There has been feedback on the web-bot-auth mailing already [0]. From these discussions,  my understanding is that visibility to the HTTP group would be useful, with a presentation and possibly adoption during the next IETF meeting.

[0] https://mailman3.ietf.org/mailman3/lists/web-bot-auth.ietf.org/
[1] https://datatracker.ietf.org/doc/html/rfc9421
[2] https://datatracker.ietf.org/doc/draft-meunier-http-message-signatures-directory/
[3] https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/
[4] https://http-message-signatures-example.research.cloudflare.com/
[5] https://github.com/cloudflareresearch/web-bot-auth

Thanks,

Thibault

p.s. for web-bot-auth mailing, this is a duplicate email. The first email to HTTP mailing does not appear to have been delivered.


On Friday, April 25th, 2025 at 2:19 PM, Thibault Meunier <ot-ietf=40thibault.uk@dmarc.ietf.org> wrote:

> Hi Dennis,
> 
> Responding inline
> 
> 
> On Friday, April 25th, 2025 at 1:21 PM, Dennis Jackson <ietf=40dennis-jackson.uk@dmarc.ietf.org> wrote:
> 
> > A mechanism to achieve this high level goal seems pretty useful.
> > 
> > If the client's signature was also over the origin which was the the target of HTTP request, does the signature still need to be fresh for every request? Presuming the client is using TLS to contact the origin, then only parties trusted by the origin should ever see the signature and so I don't think impersonation is a concern? This would make implementation a bit easier for websites and reduce the performance impact of signing / verifying on every request within a single session.
> 
> The signature is over the @authority, which is close to the origin or target but not an exact match. Presuming TLS is used (which is not a requirement in the draft at the moment), you are correct that there are less risk of impersonation.
> 
> The use of nonce if optional at the moment, allowing for replay IF that's the origin policy.
> 
> > I wasn't able to attend the side meeting and I understand OAuth-based solutions were already discussed there, but I would love to understand why something like DPoP [1] is not a good fit for this use case.
> 
> DPoP RFC 9449 could be a good solution here as well. It shares similar properties with HTTP Message Signatures RFC 9421. The current mechanism is defined for HTTP Messages signature mostly due to the existing canonicalisation of an HTTP request. One extension that was considered is to enforce integrity/non repudiation for which HTTP Message Signatures are well suited. This is not in the existing draft.
> 
> > With DPoP, each client-scraper can talk to their own deployment's OAuth authorization server to obtain credentials. When the scraping client visits a website, it either optimistically sends a DPoP which the website can verify, or advertises via header that it has such a credential for the purposes of 'web-bot-auth' and the website can decide whether to issue a challenge. I guess there are some wrinkles that mean this wouldn't be so easy to deploy in practice?
> 
> The mechanism you highlight would work as well, possibly with scoping the token with htu. With the uncertainty about which part of the HTTP request should be signed, HTTP Message Signatures appeared to be more suited for a first pass.
> 
> Both DPoP and HTTP Message Signatures need some slight scoping to fit the exact use case, therefore the draft. I mention both RFCs in a glossary draft that's currently under edition [2].
> 
> [2] https://thibmeu.github.io/draft-meunier-glossary-somehow/draft-meunier-web-bot-auth-glossary-somehow.html
> 
> Thibault
> 
> > Best,
> > Dennis
> > 
> > [1] https://datatracker.ietf.org/doc/html/rfc9449
> > 
> > On 15/04/2025 16:41, Thibault Meunier wrote:
> > 
> > > Hi web-bot-auth,
> > > 
> > > 
> > > 
> > > I've just published two new IETF drafts focused on enabling identification and authentication of web crawlers:
> > > 
> > > 
> > > 
> > > 1.  draft-meunier-http-message-signatures-directory-00 – Proposes a well-known endpoint for sharing key material used in HTTP Message Signatures, and introduces a new Signature-Agent header to support key directory discovery.
> > >     
> > > 
> > > 
> > > 
> > > 2.  draft-meunier-web-bot-auth-architecture-00 – Describes an architecture for authenticating crawlers using the above mechanism in conjunction with HTTP Message Signatures.
> > >     
> > > 
> > > 
> > > 
> > > We’ve also built a live demo to show that these drafts can be implemented with minimal changes to current systems. The demo uses a manifest v3 Chrome extension as the crawling agent, with two verifier implementations:
> > > 
> > > -   A TypeScript version for Cloudflare Workers
> > >     
> > > -   A Go plugin for the Caddy web server
> > >     
> > > 
> > > 
> > > 
> > > All code is available on GitHub: https://github.com/cloudflareresearch/web-bot-auth
> > > 
> > > 
> > > 
> > > Feedback and comments are very welcome.
> > > 
> > > 
> > > 
> > > Thanks,
> > > 
> > > Thibault

Received on Thursday, 15 May 2025 16:03:40 UTC