Re: Artart last call review of draft-ietf-httpbis-message-signatures-16 from Justin Richer on 2023-03-07 (ietf-http-wg@w3.org from January to March 2023)

From: Justin Richer <jricher@mit.edu>
Date: Tue, 7 Mar 2023 17:39:32 +0000
To: Harald Alvestrand <harald@alvestrand.no>
CC: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <269E25AB-0D57-4F5A-BBDD-D04060C84E9E@mit.edu>

Hi Harald, thanks for providing your review. Responses inline below.

On Mar 6, 2023, at 6:37 PM, Harald Alvestrand via Datatracker <noreply@ietf.org> wrote:

Reviewer: Harald Alvestrand
Review result: Not Ready

Overall opinion: This approach is wrong.

There are two basic problems with this document.

One: This document does not describe security function.
Instead, it is a toolbox of security components that can be applied in an
application in various combinations depending on the application’s security
needs and tolerance for risk.

This means that it’s impossible to evaluate, based on this document alone,
whether it is fit for purpose or not.

This document is not intended to provide a full security solution on its own, but instead to exist as a tool to be applied along with other tools to different situations. It is intended to be defined in a very narrow role to be used along with other elements to provide a full security solution for a variety of applications, including key-based authentication, proof-of-possession, non-repudiation, and other items. As such, the authors believe is fit for its intended purpose.

While it’s tempting to look to this document to provide a full and universal solution across all of HTTP, for which all security aspects can be calculated, the authors of the document strongly contend that this is not possible or desirable. HTTP is simply too widely used and varied and messy to make that kind of thing feasible. This is why so much is deferred to “the application”, because that’s where many of the important decisions need to get made, and this spec can’t make those decisions for people.

Two: The approach taken - that of assuming that a bit exact canonical form can
be regenerated from a message transferred via any combination of HTTP
functional units - is a very tall order. This approach resembles DKIM, which
has caused widespread havoc in the email ecosystem by its intolerance of common
mailing list behaviors. The complexity of even trying this task is shown by the
fact that ¼ of this 108-page spec is devoted just to the canonicalization
mechanisms - even when several complex topics are handled by referencing other
specifications.

As such, I would not recommend this going on the standards track at this time.

Yes, it is a tall order, and that’s why it’s taken such a long time to get it together and make it work right, and why, as you point out, much of the document is dedicated to that topic. There’s a lot of history and lessons learned — including DKIM — that have gone into this document. Speaking of DKIM, email messages are a distinctly different animal from what’s being targeted here, and many of the trials from there don’t apply.

IF it is possible to:
- Describe 2 or more “applications” (in the document’s terminology) that serve
an useful function in securing some part of the ecosystem against some attack -
Implement these functions in a way that exercises a fairly comprehensive subset
of the behaviors mandated in this document - Run the resulting application in a
real environment for some significant period of time, and observe that the
number of canonicalization errors resulting in validation failure is
insignificant to zero THEN it seems to me reasonable to place this on the
standards track.

Until then, I think this best belongs as an experimental protocol that people
can implement to gather experience with, not something that the IETF should
publish as a consensus standards-track protocol.

There are many very real applications from which this draft’s text was distilled over the last few years. The general approach in this document has been in use for well over a decade, in production and at scale, in multiple deployed systems.

Amazon’s SIGv4 is probably the most well-exercised version of this approach, and it’s still in use today (I can’t speak for Amazon’s plans but they are sponsoring one of the editors to work on this draft): https://docs.aws.amazon.com/general/latest/gr/signing-aws-api-requests.html

The engineers behind this original work at Amazon published their original I-D back in 2013, known as the Cavage draft in the community. This has many implementations in different versions on different systems: https://datatracker.ietf.org/doc/html/draft-cavage-http-signatures-00

One of the bigger ones out there is the Mastodon ecosystem, which uses its own version of the Cavage draft: https://docs.joinmastodon.org/spec/security/#http

As do financial profiles including FAPI, PSD2, and the Berlin Group’s work. This is to say nothing of other efforts out there that have invented or re-invented parts of this specification for their own purposes.

This draft does not invent the approach of creating detached signatures derived from HTTP messages. Instead, it codifies the best practices from a decade of deployment experience across the world, and imbues that with the expertise of the HTTP working group because, as it turns out, the canonicalization of HTTP is in fact the hardest part to get right. It’s not perfect, but neither is it impossible. What’s different in this draft is not the general approach, but the exact semantics and syntax of how to get there. Our motivation and goal from the start has been to build something with all the best parts from what’s out there today.

The rest of this review concerns smaller issues.

Larger issues
==============
Versioning of the protocol is not defined. For example, in 2.1.1, the
serialization of structured fields says that the signer MAY include the sf
parameter, and MUST do STRUCTURED-FIELDS “extensions and updates”. There is no
mechanism to indicate which version of STRUCTURED-FIELDS the signer uses; how
can one be sure that we always get a version that the verifier can reconstruct?

This can be handwaved away by saying “this must be specified by the
application” - but since we have no description of what an application spec
would look like - neither in examples nor in rules - we can’t know if this will
be handled at that level.

This has been discussed at length and resolved. You can see part of that discussion on GitHub, and more here on the list: https://github.com/httpwg/http-extensions/issues/2362

You are correct that we can’t know at this spec level what an application will do with this information, but that’s going to be true of any narrowly-defined layered approach.

The Accept-Signature: field seems more dangerous than described in the spec. In
particular, if the attacker knows the value of some field set, the attacker can
use it as an oracle; it can get a valid signature over that field set in the
signer’s signing key by specifying an Accept-Signature: that includes that
field only (plus overhead). This can then be used in a replay attack together
with unsigned components against other entities that trust the signer.

Thank you for articulating that attack vector, that’s useful and we’ll add that to the security considerations section. The mitigation seems to be pretty much the same as any other replay prevention, and we can cross-link to the appropriate discussions.

Smaller issues
==============
These are more at the level of nits - worth fixing or making the meaning more
obvious, but they are not show-stoppers by any means. These are listed by
section, sequentially.

1.1 - Definitions: “Unix time” is not defined. “Key identifier” is used but not
defined.

From the definitions section: The term "Unix time" is defined by [POSIX.1], Section 4.16. I copied this language from a different RFC that also uses integer timestamps. That said, I do see now that the actual words “unix time” do not actually appear in the POSIX document, so we’ll change this to:

The term “Unix timestamp” refers to the number of integer seconds elapsed since Epoch, as defined by [POSIX.1], Section 4.16.

2 - Canonicalization. The text is not explicit that case differences in field
names do not matter; it just implies it (by lowercasing everything).
Cache-Control: and cache-control: are the same header, and if both occur, they
must be merged. Be explicit.

This specification does not define the case semantics of field names, that belongs to HTTP semantics. This specification does have to deal with that reality, and this is already explicitly called out in the text referenced here with justification:

The component name for an HTTP field is the lowercased form of its field name as defined in Section 5.1 of [HTTP]. While HTTP field names are case-insensitive...

Values for merged headers are called out here, again defined by HTTP semantics:

HTTP field values MUST be combined into a single value as defined in Section 5.2 of [HTTP]

The spec assumes that case is not being changed in any field value over which
signatures are computed. This should be called out.

This is called out in the text in section 7.5.2, which deals with many possible transformations including case insensitivity:

For example, a field definition can declare some or all of its value to be case-insensitive

2.1.1 - use of ;bs - the term “known by the application to cause problems with
canonicalization” is handwaving. Step 3 of this algorithm seems to assume that
all field values have an unique ASCII representation; is this assumption
warranted?

For the algorithm to be applied, yes, there needs to be a stable set of bytes to encode. If that doesn’t exist, then you can’t use this. If there’s a better source for the “bytes of the field value” from HTTP semantics, I’d happily refer to that rather than lean on ASCII.

2.2 - the term “printable character” is undefined. Are we dealing with
0x20-0x7E (ASCII) or some subset thereof, or do Unicode characters occur here?

Thanks, that can be made more explicit.

All of section 2.2 seems to assume that we’re dealing only with HTTP URLs. This
assumption should be made explicit.

This specification is signing for HTTP messages. What other URLs would there be at play here? It seems redundant to call it out, especially because we’re using the terms from the HTTP semantics specification to define the components.

For @authority, the reference for normalization does NOT specify lowercasing;
it says “The scheme and host are case-insensitive and normally provided in
lowercase”. Please be explicit that lowercasing MUST be done when computing the
signature base.

(Since I’m working with IDNs, I have to ask if the @authority is comprised of
A-labels or U-labels; I suspect that the answer is “obviously A-labels”, but
offhand, I can’t find the sentence that states this.)

Thanks for pointing that out, we’ll make the value explicitly lowercase like other case-insensitive values (@scheme and field names, in particular).

For IDNs, I’ll defer to the HTTP experts what the appropriate data source for the authority of the target URL is here. I do believe you’re right that it should be the A-label, but that might need to be something more clearly specified in HTTP.

Section 2.3 assumes that “Unix time” is an integer number of seconds. This
depends on what definition of “Unix time” is used (Unix “man 2 time” gives you
the integer representation; other representations include fractional seconds.)

It does not assume anything, the text explicitly states that it is an integer number of seconds and sub-second precision is not supported.

In section 2.4, when multiple modifiers are used, is there a convention for
their order, or do you depend on the verifier using the signer’s order when
reconstructing the signature base?

There is no convention to the order, but parameter order is preserved by the structured field data structure definitions. We call this out as an important item in the signature base serialization section (2.3):

Note that the component identifiers can include their own parameters, and these parameters are ordered sets. Once an order is chosen for a component's parameters, the order cannot be changed.

In section 2.5, the signature base is computed with LF as a line ending, as
opposed to the CRLF line ending conventionally used in HTTP/1.1. This should be
called out, justified, or changed.

Previous versions of this work (including SIGv4 and the Cavage drafts) use LF, and I don’t see a compelling reason to change that, especially since this specification is not specific to HTTP/1.1 and the signature base does not need to be compatible with that. We can call it out as a note though, thanks for bringing it up.

Section 4.3, discussing a proxy re-signing a message where it knowingly damages
the message so that its original signature can’t be verified, is confusing. The
text seems to be saying that the original (now failing) signature will be
forwarded, so the final verifier will probably try to verify both signatures,
have one fail and one succeed, and has to take the proxy’s word that the
original signature was OK. This means, of course, that the proxy can carry out
any attack it desires.

More worrisome is that the text does not call out explicitly that this is what
is expected: That the signature from a trusted signer saying that another
signature is to be believed even when it verifies as bad should cause the final
verifier to suspend disbelief. Being very explicit here would be good.

The scenario described here is incredibly common. Microservice architectures behind a gateway, for example. You want your gateway to be able to say “yes, this is the message that came in, and I checked that this signature given validates against the message that I saw. Now, here’s the message for you specifically, signed by me to prove that I verified it.” The downstream application is relying on the proxy to provide this function. The proxy in this example is a trusted party to provide signatures and proxied content. If the proxy is compromised, it can do a whole lot more than just lie about signatures. For example, the proxy could literally make up an incoming request out of whole cloth and send it to an internal service, or fundamentally alter any aspect of the request to the origin server. I don’t believe it’s this specification’s job to discuss the nature of HTTP proxy systems in depth.

For systems that do want end-to-end protection, as long as the proxy does not modify any of the signed components, it will remain transparent to the system as a whole. This is one of the things that applications need to consider when profiling and applying this work.

Section 5.2 uses the undefined notion of “fail the processing” for an
Accept-Signature. What is supposed to happen to the request in that case? 500
failure, or just ignoring the Accept-Signature request?

That is up to the application. Specifically, it depends on what your error states for this request are. A lot of applications could ignore it, some would fail the request.

In reading section 7, there seems to be a number of things that are punted on
in the direction of “the application”. This calls out again that there is no
guidance in the document about what an application needs to look like.

This guidance is in section 1.4 and in again in section 3.2.1.

Section 7.5.6 details the difficulties in signing the Set-Cookie header (a
major attack target). If the mechanism can’t handle this, is it worth doing?

Yes, this was discussed explicitly as a needed workaround for a common header. Set-Cookie is a known rule-breaker in the HTTP world, and the mitigation mechanism here works for this header as well as anything else in the wild that might need similar protection.

Section 7.5.7 assumes that all header values can be validated. This seems like
a tall order, since the concept of “validation” isn’t well defined. (You can’t
validate an x-undefined: header)

This section makes no such assumption — instead, it provides header value validation as one mitigation against a very specific attack. If you can’t do that, like with your example, then you can’t apply that mitigation in this situation. As such, your application might want to reject message that sign unknown field components that you can’t validate.

Thank you again for your review of the document. The editors will put together a PR to address the text changes mentioned above.

— Justin

Received on Tuesday, 7 March 2023 17:40:29 UTC