- From: Lucas Pardue <lucaspardue.24.7@gmail.com>
- Date: Sat, 11 Dec 2021 03:12:19 +0000
- To: Justin Richer <jricher@mit.edu>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CALGR9oaBkffxTTB62VPfL6XJy=aVe+TpdKW2oJKQNW9jZVE3VQ@mail.gmail.com>
Hi Justin, Thanks for the review comments, lots of good stuff here for us to consider. I've got some responses in-line, and I've spun out some things into issues where it was obvious to me. (For conciseness, where I say Digest I speak to Content-Digest too.) On Fri, Dec 10, 2021 at 8:31 PM Justin Richer <jricher@mit.edu> wrote: > Lucas and Roberto, > > Apologies for the late response! Here is my last call review of the > document. I think it’s in mostly good shape, and it’s a big improvement > over 3230. That said, I think there are a few issues that could be > improved, and a number of nits that I found when working through the text: > > > ---- > > Structured Fields: > > It bothers me that the draft doesn't use structured fields, especially for > shipping around the binary blobs that it defines. I know it needs to be > compatible with existing implementations as much as possible, but all > existing algorithm identifiers from 3230 have been obsoleted -- why keep > the old format for newly-defined identifiers? rfc5843 does make this a > problem for the sha-256 and she-512 algorithms since they're still > used/recommended, though this could be solved by registering new > identifiers for these and deprecating the ones from 5843 (sha256 and > sha512, just as strawman examples) and using the sf-binary format. > > Proposal: change the requirements for newly defined algorithms such that > it's sf based; existing algorithms keep the old syntax, but they're all > deprecated and shouldn't be used anyway. We already have this text: > > Every digest-algorithm defines its computation procedure and encoding > output > > I think we could take that to mean that the "<encoded digest output>" is > encoded as an sf-binary for all new algorithms, but it doesn't have to be > for existing ones. > > The token portion is problematic because it was previously defined as > case-insensitive. But if the registry is updated such that new algorithms > are all lowercase, like an sf-token, does that solve things? We have "we > suggest to use lowercase for digest-algorithms", and I think making it a > requirement for new registrations to be compliant with sf-token would help > future usage. I think this could mostly be handled in §6, with a little > shuffling of other content. > > My goal here is to be able to use structured field parsers on the Digest > and Content-Digest headers in environments when I know the only acceptable > algorithms are the newly-defined ones. > This is a bigger issue than we can reasonably talk about alongside all of the other things you raise. I suggest creating a new thread or issue for further discussion. We no longer strictly obsolete algorithms but mark them as insecure, see issue 1671 for why [1]. From my perspective, you're conflating the use of Structured Fields Items and encoding with defining a fully-defined Structured Field header. There is a difference between 1) defining a new digest-algorithm that encodes its output like a Byte Sequence [2] (base64 sandwiched by colons) and using that as an entry in the list format that Digest presently defines, and 2) stating that Digest can be treated as a structured field. I.e this might look like a structured fields Dictionary Digest: my-alg=:cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg==: But there's a few ways that extant RFC 3230 implementations might populate Digest that could trip up your attempts to parse it as a Dictionary (one of your later comments touches on this too). I think Digest is pretty similar to the Range header [3], which is absent from Mark's retrofit document [4]. I'm guessing because of the similar problems. I think if we want to pursue structured digests further, we should just admit that changing the world is hard and define new fields. These would just offer an alternative syntax and a new digest algorithm IANA table to use with that syntax (we could even port forward problem algorithms by making them change their encoding). Most of the details about content and representation would remain the same and not need redefining. Strawperson names Sf-Digest, Sf-Content-Digest. Want- fields are less problematic and can be treated as sf-list AFAICT, so might not need an Sf- equivalent. §1: I'd like to see some notion in the introduction about this draft's use > in requests. The existing language is almost exclusively about responses, > but my own most important use of this work is to protect the body of a > request. This observation holds throughout the document - the focus is > clearly on digests for responses, and I think that's a lot of where the > representation vs. content discussion really comes into play. > I don't agree with this observation. We're pretty clear that the fields defined in this document apply to requests and responses, stating so on every occasion. The only mentions of client and server are in the appendices, which are examples. Section 7 does detail some special considerations for responses but that's only because they need it. If requests need some special considerations that we do not cover, then I'm open to additions of that sort §1.3: it seems like this document is going for both semantic and syntactic > compatibility of existing fields and definitions, right? The text currently > says semantic only. > Agreed, see https://github.com/httpwg/http-extensions/issues/1840 §2: I know I'm probably in the minority in this group, but I :still: have a > hard time following the difference between "representation" and "content" > in any meaningful way, even after reading §A. I also know that it's > important for this draft to be very, very precise in how this is applied, > so this section does make sense. However, I think this discussion might > make more sense as a subsection of §3, now that we have §4 to talk about > the more simplified content-digest specifically, since it's now a detail of > just one of the headers defined. > > > §3: The ABNF should have the line from §2 included, it reads very > strangely without it. Another argument for combining §2 into §3. > We started out with a similar structure to RFC 3230 but the addition of Content-Digest recently has thrown the balance of the document a bit off. I created https://github.com/httpwg/http-extensions/issues/1833 to track combining the sections. As to the representation vs content topic, I'm not sure what to suggest. IMO semantics is really the place for those definitions to exist and be comprehensible, and we cite those appropriately. RFC 3230 made great pains to describe these concepts and it wound up outdated as the definition of HTTP marched on. I'd like to avoid repeating that mistake. Maybe someone has a smart, salient way to describe representation and content in this draft in a way we don't already cover? §3: I am probably missing the reference, but where is the `1#...` syntax > for the ABNF defined, and does that translate to the comma-separated list > format? > The #rule is is imported in the notational conventions from Section 5.6.1 <https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19#section-5.6.1> of semantics. §3: "Incremental digest algorithm" is mentioned with no additional > information or explanation as to what that would be or how it would work. > > §4: "Incremental digest algorithm" is mentioned with no additional > information or explanation as to what that would be or how it would work. > I agree and made https://github.com/httpwg/http-extensions/issues/1834 to track these. > > §5: It's unclear what "Want-Digest" and "Want-Content-Digest" mean in the > context of a response. Does that mean the server wants a digest on one or > more future requests? (That seems the most reasonable). The relevant text > we have for "Accept-Signature" is currently the following (this itself > probably needs additional work): > > When the Accept-Signature field is used in an HTTP Response message, the > field indicates that the server desires the client to sign its next request > to the server with the identified parameters, and the target message is the > client's next request. The client can choose to also continue signing > future requests to the same server in the same way. > We currently say "sender's desire to receive a representation digest on messages associated with the request URI and representation metadata", maybe we should make that clearer for requests and responses, considering what you proposed. See https://github.com/httpwg/http-extensions/issues/1846 . > §5: This doesn't really say what the receiver of the Want- fields has to > do. If there's no obligation to fulfill it (which I think is the case), we > should state that explicitly here. > This is also sort of underdefined in RFC 3230 and we've carried that across. The Digest and Content-Digest sections state that a sender MAY send anything. Additionally stating that there is no obligation to follow Want- fields SGTM. See https://github.com/httpwg/http-extensions/issues/1835 §6: as stated above, I think we could address most of the structured field > discussions here. For instance, the encoding claims a "quoted string" but > could instead just declare a "binary field" or "structured field item" > here. None of the existing registrations use quoted strings, I might add, > so this is in line with that. > The sentence you're referring to in full is " The associated encoding for new digest-algorithms MUST either be represented as a quoted string or MUST NOT include ";" or "," in the character sets used for the encoding" This is very close to what RFC 3230 said. Which (arguably) is a roundabout way of saying "do whatever you like but if the resultant value contains a comma or semicolon it could screw up the Digest header parser, because those characters are special. Therefore, if you really want a comma or semicolon in your encoded output character set, you MUST serialize that as a quoted string". The registered algorithms to date don't have that problem, so they don't need to be quoted. If some new algorithm came along that wanted to encode the value like an sf-binary (base64 bookended by colons) then that's totally fine and it would not need to be a quoted string. > §6: Some of the existing algorithm descriptions should be toned down. > Specifically, CRC32c claiming "better hamming distance" and talking about > its uses is completely out of place in this registry. > These are verbatim what is already in the IANA table at https://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml. Personally I would be in favor of reducing the descriptions down to include purely what the algorithm is and how the output is encoded for use in Digest. What do others think? See https://github.com/httpwg/http-extensions/issues/1836 §7: This section has me baffled, and I'm not sure what it's saying or where > I'm getting lost. After a couple reads I've almost convinced myself that > it's saying that the Digest header in a request would actually be a digest > of the **resource that is being requested**. Is that the intent? Because if > so, that seems completely insane to me, and I wouldn't know how to process > that, so I hope I'm wrong. My expectation with requests, including all > state-changing requests, is that it's the message content of **the request > itself** that is hashed for the digest. So if I do a POST, I digest the > body and add the header. If I do a PUT, same deal. Like in the example, if > I do a PATCH, I digest only the part that I send, not the "whole" thing > that's being represented -- or is that only for Content-Digest? I'm > honestly not sure. But for GET, I would expect that Digest would not be > allowed because there's no body to digest. > I'll defer to Roberto for a response here. §8: Additional security consideration: Hash collision. This is an important > consideration in choosing acceptable algorithms for an application. > Agreed, see https://github.com/httpwg/http-extensions/issues/1837. If you have any suggestion for the text you'd like to see your input would be appreciated. §8: Additional security consideration: Bad encoding to break parsers. For > example, sending bad base64 in an attempt to break or crash the parser, > which is needed to validate the digest. > Interesting. Can I ask a bit more about the threat model you have in mind here? Do you have any examples of such attacks? I'm not opposed to addressing genuine concerns but I don't fully understand the scope of the problem in relation to Digest headers. To put it another way, what is it about Digest that makes header parsing failure worse than any other header field? §8.1: It's not just accidental corruption, but also malicious changes. The > important part here is that it only protects the message body and not the > rest. > Agreed. See https://github.com/httpwg/http-extensions/issues/1838 §8.2: I don't see how this is a security consideration except for the last > line, which is mostly punted to §8.3. Consider rewriting this portion to be > about the actual security gap, and not a list of the spec's virtues and > applications. This could probably be covered by §8.1 and §8.3 anyway, but > if there's enough to say in its own section, then focus on what's different > and discussed here. > > §8.3: Surprised not to see a reference to httpbis-message-signatures in > here. :) > > §8.3: You SHOULD NOT use normative requirements in the security > considerations. The considerations sections are about providing > explanations, examples, and workarounds -- things to think about during > deployment. Actual requirements belong in the main document, and so these > should be extracted to the appropriate sections. The first line needs to be > its own security consideration referencing BCP19 for TLS usage, the second > can go into §6 probably, with back and forward references. > The three points are good, we can definitely do some editorial work to improve things. First though, the question I think the WG can help us answer now is do we need to say anything in detail about Signatures? I fully appreciate there is a strong relationship between the two but it seems that Signatures should state how Digest is used, not the other way around. I propose we would be fine with simply highlighting the limitations of digest (it doesn't cover HTTP fields) and then noting that there's a way to encode the digest of an empty representation that can be used by signature to protect from some types of manipulation. On the requirement we currently have - "Digest fields SHOULD always be used over a connection that provides integrity at the transport layer that protects HTTP fields." - I'm less certain this makes sense to retain. There seems nothing unique in Digest that requires this property when compared to other HTTP fields. There's an argument to treat everything as an untrusted input, making some assumptions based on the transport layer seems like a bad recommendation the more I think about it. Maybe someone has a different opinion? §8.5: The fact that you could digest an unencrypted payload then send it > encrypted genuinely surprised me -- I would not have thought that possible > within the content-encoding mechanisms of HTTP. I think this is going to be > a combination that leads to many errors. > I think this consideration was mainly in relation to RFC 8188 (Encrypted Content-Encoding) the "id-sha-*" algorithms that were added but removed just prior to WGLC. I suspect now, calculation the digest of plaintext and sticking it in Digest along with Content-Encoding: aes128gcm would be broken or dumb. Adding text to describe all the way things could be broken is not great, we would be better removing this section IMO. See https://github.com/httpwg/http-extensions/issues/1839 §8.6: This draft's algorithm agility effectively allows an attacker to > choose and substitute the algorithm applied to the message, which could > allow the attacker to choose a weaker algorithm and apply a collision hash > to a request. Signing the digest headers, with things like message > signatures, mitigates this by fixing the values in the digest header. Also, > a recipient should have a limited set of algorithms that it should accept > for a given application, and not just take whatever the message creator (or > attacker) sends it. > Thanks. See https://github.com/httpwg/http-extensions/issues/1841 > §8.7: Structured field dictionaries disallow the same key in a dictionary, > so couldn't we just say that repeating a key is simply an error? I know > 3230 doesn't say that it's prohibited, but I also don't see where it says > that it's allowed, so I would rather this new document take the more strict > interpretation. I can't think of a good security or application reason to > allow this, even if it's application-dependent. > The list syntax permits this. If it wasn't an error before, it's not really ok to say it's an error now. There's no way to negotiate "which version" of Digest is in play. > §XX: The document is missing privacy considerations. This RFC provides > good guidelines for writing them: > https://datatracker.ietf.org/doc/html/rfc6973 > Do you have any specific concerns for the fields defined in this document that you feel need to be addressed by privacy considerations? The last 7 RFCs that this group produced did not include a privacy considerations section. I'd be more than happy to include one if we need it but as far as my assessment goes there's nothing special going on here. But that's just my assessment, if others have concerns please raise them. §9.2: Is normative text allowed in such a description here? > I don't think we can make an IANA table entry normative for anything :-) Tracked in https://github.com/httpwg/http-extensions/issues/1842 §A: This section is missing examples of how the Digest and Content-Digest > headers would be applied to each of these messages. I know that's also the > goal of the next section, but it seems out of place separated from each > other. I also realize that the entire point of having something like §A is > because of people like me who still have problems wrapping their heads > around this topic. :) > I'm 50/50 on this. I can see why it might be nice but it risks overloading the reader with concepts when the purpose of the section is focus on representation data. Then again, it might help illustrate the differences between Digest and Content-Digest. A candidate PR might help the debate. Tracked in https://github.com/httpwg/http-extensions/issues/1843 §B: I'd like to see examples that are doing things other than REST-style > resource management. APIs have a real need for having the client digest a > POSTed JSON object, for example, and getting back a completely different > JSON object, which may or may not have anything to do with the request body > in structure or form. For example, in GNAP, you POST a grant request and > get back a grant response, and neither of these are "resources" or "files" > in the traditional HTTP or REST model. This is overwhelmingly common and > I'd like to see more examples of what that looks like here. > For me, the important part of the spec is defining clearly how it relates to content and representation. Implementations will need to apply that to various permutations of using HTTP. The examples are there to illustrate different applications of Digest, or specific interactions that might not be totally obvious. In appendix B.6 we show a client sending a JSON object and a server responding with a brotli encoded JSON. What's inside the JSON here isn't really on note, the important thing is how Digest is calculated. It is not clear to me how your example of GNAP is much different from B.6 (but correct me if I'm wrong). From a cursory glance, it's just POST method and request and response content of JSON. > §C: These examples should say what the recipient of the message should do, > even if it's "we don't say what they should do". In §C.2 for example, > should the client ask again, can it throw its own error, what? > Why? These examples illustrate the basic interactions. Anything else is an implementation decision, so could be misinterpreted as a suggestion. §C.3: This is an interesting example where the digest on the response (were > it included) would be of the error message, not the "resource" that was > requested. Or at least, that's what I'd expect it to be here. Is that true? > See B.10 that deals with this. > Nits: > > Throughout document, "checksum" and "digest" are used interchangeably, is > that intentional? It would probably be better to pick one, and I suggest > "digest". > Tracked in https://github.com/httpwg/http-extensions/issues/1844 > §9: I'd personally list the header field registrations in the order: > Digest, Content Digest, Want-Digest, Want-Content-Digest, as they show up > in the document in that order. > Easy fix. Tracked in https://github.com/httpwg/http-extensions/issues/1845 > --- > > > While this is a long review, I honestly do think the document is overall > in good shape and hopefully these can all be addressed pretty easily. > There's many useful points here, thanks for taking the time! Cheers, Lucas [1] - https://github.com/httpwg/http-extensions/issues/1671 [2] - https://www.rfc-editor.org/rfc/rfc8941.html#name-byte-sequences [3] - https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19#section-14.2 [4] - https://www.ietf.org/archive/id/draft-nottingham-http-structure-retrofit-00.html
Received on Saturday, 11 December 2021 03:12:45 UTC