Re: Working Group Last Call: Digest Fields from Justin Richer on 2021-12-10 (ietf-http-wg@w3.org from October to December 2021)

From: Justin Richer <jricher@mit.edu>
Date: Fri, 10 Dec 2021 15:31:47 -0500
To: Lucas Pardue <lucaspardue.24.7@gmail.com>
Cc: Mark Nottingham <mnot@mnot.net>, HTTP Working Group <ietf-http-wg@w3.org>, Tommy Pauly <tpauly@apple.com>
Message-Id: <517C1B4B-E8A2-40F3-89CE-8B991203E858@mit.edu>
Lucas and Roberto,

Apologies for the late response! Here is my last call review of the document. I think it’s in mostly good shape, and it’s a big improvement over 3230. That said, I think there are a few issues that could be improved, and a number of nits that I found when working through the text:


----

Structured Fields:

It bothers me that the draft doesn't use structured fields, especially for shipping around the binary blobs that it defines. I know it needs to be compatible with existing implementations as much as possible, but all existing algorithm identifiers from 3230 have been obsoleted -- why keep the old format for newly-defined identifiers? rfc5843 does make this a problem for the sha-256 and she-512 algorithms since they're still used/recommended, though this could be solved by registering new identifiers for these and deprecating the ones from 5843 (sha256 and sha512, just as strawman examples) and using the sf-binary format.

Proposal: change the requirements for newly defined algorithms such that it's sf based; existing algorithms keep the old syntax, but they're all deprecated and shouldn't be used anyway. We already have this text:

 Every digest-algorithm defines its computation procedure and encoding output

I think we could take that to mean that the "<encoded digest output>" is encoded as an sf-binary for all new algorithms, but it doesn't have to be for existing ones. 

The token portion is problematic because it was previously defined as case-insensitive. But if the registry is updated such that new algorithms are all lowercase, like an sf-token, does that solve things? We have "we suggest to use lowercase for digest-algorithms", and I think making it a requirement for new registrations to be compliant with sf-token would help future usage. I think this could mostly be handled in §6, with a little shuffling of other content.

My goal here is to be able to use structured field parsers on the Digest and Content-Digest headers in environments when I know the only acceptable algorithms are the newly-defined ones.



§1: I'd like to see some notion in the introduction about this draft's use in requests. The existing language is almost exclusively about responses, but my own most important use of this work is to protect the body of a request. This observation holds throughout the document - the focus is clearly on digests for responses, and I think that's a lot of where the representation vs. content discussion really comes into play.

§1.3: it seems like this document is going for both semantic and syntactic compatibility of existing fields and definitions, right? The text currently says semantic only.


§2: I know I'm probably in the minority in this group, but I :still: have a hard time following the difference between "representation" and "content" in any meaningful way, even after reading §A. I also know that it's important for this draft to be very, very precise in how this is applied, so this section does make sense. However, I think this discussion might make more sense as a subsection of §3, now that we have §4 to talk about the more simplified content-digest specifically, since it's now a detail of just one of the headers defined.


§3: The ABNF should have the line from §2 included, it reads very strangely without it. Another argument for combining §2 into §3.

§3: I am probably missing the reference, but where is the `1#...` syntax for the ABNF defined, and does that translate to the comma-separated list format?

§3: "Incremental digest algorithm" is mentioned with no additional information or explanation as to what that would be or how it would work.

§4: "Incremental digest algorithm" is mentioned with no additional information or explanation as to what that would be or how it would work.

§5: It's unclear what "Want-Digest" and "Want-Content-Digest"  mean in the context of a response. Does that mean the server wants a digest on one or more future requests? (That seems the most reasonable). The relevant text we have for "Accept-Signature" is currently the following (this itself probably needs additional work):

 When the Accept-Signature field is used in an HTTP Response message, the field indicates that the server desires the client to sign its next request to the server with the identified parameters, and the target message is the client's next request. The client can choose to also continue signing future requests to the same server in the same way.

§5: This doesn't really say what the receiver of the Want- fields has to do. If there's no obligation to fulfill it (which I think is the case), we should state that explicitly here.

§6: as stated above, I think we could address most of the structured field discussions here. For instance, the encoding claims a "quoted string" but could instead just declare a "binary field" or "structured field item" here. None of the existing registrations use quoted strings, I might add, so this is in line with that.

§6: Some of the existing algorithm descriptions should be toned down. Specifically, CRC32c claiming "better hamming distance" and talking about its uses is completely out of place in this registry.

§7: This section has me baffled, and I'm not sure what it's saying or where I'm getting lost. After a couple reads I've almost convinced myself that it's saying that the Digest header in a request would actually be a digest of the **resource that is being requested**. Is that the intent? Because if so, that seems completely insane to me, and I wouldn't know how to process that, so I hope I'm wrong. My expectation with requests, including all state-changing requests, is that it's the message content of **the request itself** that is hashed for the digest. So if I do a POST, I digest the body and add the header. If I do a PUT, same deal. Like in the example, if I do a PATCH, I digest only the part that I send, not the "whole" thing that's being represented -- or is that only for Content-Digest? I'm honestly not sure. But for GET, I would expect that Digest would not be allowed because there's no body to digest.

§8: Additional security consideration: Hash collision. This is an important consideration in choosing acceptable algorithms for an application.

§8: Additional security consideration: Bad encoding to break parsers. For example, sending bad base64 in an attempt to break or crash the parser, which is needed to validate the digest.

§8.1: It's not just accidental corruption, but also malicious changes. The important part here is that it only protects the message body and not the rest.

§8.2: I don't see how this is a security consideration except for the last line, which is mostly punted to §8.3. Consider rewriting this portion to be about the actual security gap, and not a list of the spec's virtues and applications. This could probably be covered by §8.1 and §8.3 anyway, but if there's enough to say in its own section, then focus on what's different and discussed here.

§8.3: Surprised not to see a reference to httpbis-message-signatures in here. :) 

§8.3: You SHOULD NOT use normative requirements in the security considerations. The considerations sections are about providing explanations, examples, and workarounds -- things to think about during deployment. Actual requirements belong in the main document, and so these should be extracted to the appropriate sections. The first line needs to be its own security consideration referencing BCP19 for TLS usage, the second can go into §6 probably, with back and forward references.

§8.5: The fact that you could digest an unencrypted payload then send it encrypted genuinely surprised me -- I would not have thought that possible within the content-encoding mechanisms of HTTP. I think this is going to be a combination that leads to many errors.

§8.6: This draft's algorithm agility effectively allows an attacker to choose and substitute the algorithm applied to the message, which could allow the attacker to choose a weaker algorithm and apply a collision hash to a request. Signing the digest headers, with things like message signatures, mitigates this by fixing the values in the digest header. Also, a recipient should have a limited set of algorithms that it should accept for a given application, and not just take whatever the message creator (or attacker) sends it.

§8.7: Structured field dictionaries disallow the same key in a dictionary, so couldn't we just say that repeating a key is simply an error? I know 3230 doesn't say that it's prohibited, but I also don't see where it says that it's allowed, so I would rather this new document take the more strict interpretation. I can't think of a good security or application reason to allow this, even if it's application-dependent.

§XX: The document is missing privacy considerations. This RFC provides good guidelines for writing them: https://datatracker.ietf.org/doc/html/rfc6973

§9.2: Is normative text allowed in such a description here?

§A: This section is missing examples of how the Digest and Content-Digest headers would be applied to each of these messages. I know that's also the goal of the next section, but it seems out of place separated from each other. I also realize that the entire point of having something like §A is because of people like me who still have problems wrapping their heads around this topic. :)

§B: I'd like to see examples that are doing things other than REST-style resource management. APIs have a real need for having the client digest a POSTed JSON object, for example, and getting back a completely different JSON object, which may or may not have anything to do with the request body in structure or form. For example, in GNAP, you POST a grant request and get back a grant response, and neither of these are "resources" or "files" in the traditional HTTP or REST model. This is overwhelmingly common and I'd like to see more examples of what that looks like here.

§C: These examples should say what the recipient of the message should do, even if it's "we don't say what they should do". In §C.2 for example, should the client ask again, can it throw its own error, what?

§C.3: This is an interesting example where the digest on the response (were it included) would be of the error message, not the "resource" that was requested. Or at least, that's what I'd expect it to be here. Is that true?

Nits:

Throughout document, "checksum" and "digest" are used interchangeably, is that intentional? It would probably be better to pick one, and I suggest "digest".

§9: I'd personally list the header field registrations in the order: Digest, Content Digest, Want-Digest, Want-Content-Digest, as they show up in the document in that order.

---


While this is a long review, I honestly do think the document is overall in good shape and hopefully these can all be addressed pretty easily.

— Justin

> On Dec 9, 2021, at 12:30 PM, Lucas Pardue <lucaspardue.24.7@gmail.com> wrote:
> 
> Hi folks,
> 
> Friendly reminder that the Digest headers draft is still in WGLC for a week. We didn't see any issues get raised yet and as much as I might like to believe my editorial skills are adequate, I'm sure there are things that some fresh eyes could help notice and improve on.
> 
> Cheers
> Lucas
> 
> On Wed, Nov 17, 2021 at 1:35 AM Mark Nottingham <mnot@mnot.net <mailto:mnot@mnot.net>> wrote:
> Hello everyone,
> 
> This message starts Working Group Last Call for this document:
>   https://datatracker.ietf.org/doc/draft-ietf-httpbis-digest-headers/ <https://datatracker.ietf.org/doc/draft-ietf-httpbis-digest-headers/>
> 
> Please review the document and raise any issues you see, either in the issues list or in response to this message. Statements of support for publication are also helpful.
> 
> Because of the intervening US holidays, this LC will last a little longer than usual; please get your feedback in by 15 December.
> 
> Cheers,
> 
> 
> --
> Mark Nottingham   https://www.mnot.net/ <https://www.mnot.net/>
> 
>
Received on Friday, 10 December 2021 20:32:10 UTC