Re: Working Group Last Call: Digest Fields from Lucas Pardue on 2021-12-11 (ietf-http-wg@w3.org from October to December 2021)

From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Sat, 11 Dec 2021 03:12:19 +0000
To: Justin Richer <jricher@mit.edu>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CALGR9oaBkffxTTB62VPfL6XJy=aVe+TpdKW2oJKQNW9jZVE3VQ@mail.gmail.com>
Hi Justin,

Thanks for the review comments, lots of good stuff here for us to consider.

I've got some responses in-line, and I've spun out some things into issues
where it was obvious to me. (For conciseness, where I say Digest I speak to
Content-Digest too.)

On Fri, Dec 10, 2021 at 8:31 PM Justin Richer <jricher@mit.edu> wrote:

> Lucas and Roberto,
>
> Apologies for the late response! Here is my last call review of the
> document. I think it’s in mostly good shape, and it’s a big improvement
> over 3230. That said, I think there are a few issues that could be
> improved, and a number of nits that I found when working through the text:
>
>
> ----
>
> Structured Fields:
>
> It bothers me that the draft doesn't use structured fields, especially for
> shipping around the binary blobs that it defines. I know it needs to be
> compatible with existing implementations as much as possible, but all
> existing algorithm identifiers from 3230 have been obsoleted -- why keep
> the old format for newly-defined identifiers? rfc5843 does make this a
> problem for the sha-256 and she-512 algorithms since they're still
> used/recommended, though this could be solved by registering new
> identifiers for these and deprecating the ones from 5843 (sha256 and
> sha512, just as strawman examples) and using the sf-binary format.
>
> Proposal: change the requirements for newly defined algorithms such that
> it's sf based; existing algorithms keep the old syntax, but they're all
> deprecated and shouldn't be used anyway. We already have this text:
>
> Every digest-algorithm defines its computation procedure and encoding
> output
>
> I think we could take that to mean that the "<encoded digest output>" is
> encoded as an sf-binary for all new algorithms, but it doesn't have to be
> for existing ones.
>
> The token portion is problematic because it was previously defined as
> case-insensitive. But if the registry is updated such that new algorithms
> are all lowercase, like an sf-token, does that solve things? We have "we
> suggest to use lowercase for digest-algorithms", and I think making it a
> requirement for new registrations to be compliant with sf-token would help
> future usage. I think this could mostly be handled in §6, with a little
> shuffling of other content.
>
> My goal here is to be able to use structured field parsers on the Digest
> and Content-Digest headers in environments when I know the only acceptable
> algorithms are the newly-defined ones.
>

This is a bigger issue than we can reasonably talk about alongside all of
the other things you raise. I suggest creating a new thread or issue for
further discussion.

We no longer strictly obsolete algorithms but mark them as insecure, see
issue 1671 for why [1].

From my perspective, you're conflating the use of Structured Fields Items
and encoding with defining a fully-defined Structured Field header. There
is a difference between 1) defining a new digest-algorithm that encodes its
output like a Byte Sequence [2] (base64 sandwiched by colons) and using
that as an entry in the list format that Digest presently defines, and 2)
stating that Digest can be treated as a structured field.

I.e this might look like a structured fields Dictionary
Digest: my-alg=:cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg==:

But there's a few ways that extant RFC 3230 implementations might populate
Digest that could trip up your attempts to parse it as a Dictionary (one of
your later comments touches on this too). I think Digest is pretty similar
to the Range header [3], which is absent from Mark's retrofit document [4].
I'm guessing because of the similar problems.

I think if we want to pursue structured digests further, we should just
admit that changing the world is hard and define new fields. These would
just offer an alternative syntax and a new digest algorithm IANA table to
use with that syntax (we could even port forward problem algorithms by
making them change their encoding). Most of the details about content and
representation would remain the same and not need redefining. Strawperson
names Sf-Digest, Sf-Content-Digest. Want- fields are less problematic and
can be treated as sf-list AFAICT, so might not need an Sf- equivalent.

§1: I'd like to see some notion in the introduction about this draft's use
> in requests. The existing language is almost exclusively about responses,
> but my own most important use of this work is to protect the body of a
> request. This observation holds throughout the document - the focus is
> clearly on digests for responses, and I think that's a lot of where the
> representation vs. content discussion really comes into play.
>

I don't agree with this observation. We're pretty clear that the fields
defined in this document apply to requests and responses, stating so on
every occasion. The only mentions of client and server are in the
appendices, which are examples. Section 7 does detail some special
considerations for responses but that's only because they need it. If
requests need some special considerations that we do not cover, then I'm
open to additions of that sort

§1.3: it seems like this document is going for both semantic and syntactic
> compatibility of existing fields and definitions, right? The text currently
> says semantic only.
>

Agreed, see https://github.com/httpwg/http-extensions/issues/1840

§2: I know I'm probably in the minority in this group, but I :still: have a
> hard time following the difference between "representation" and "content"
> in any meaningful way, even after reading §A. I also know that it's
> important for this draft to be very, very precise in how this is applied,
> so this section does make sense. However, I think this discussion might
> make more sense as a subsection of §3, now that we have §4 to talk about
> the more simplified content-digest specifically, since it's now a detail of
> just one of the headers defined.
>
>
> §3: The ABNF should have the line from §2 included, it reads very
> strangely without it. Another argument for combining §2 into §3.
>

We started out with a similar structure to RFC 3230 but the addition of
Content-Digest recently has thrown the balance of the document a bit off.
I created https://github.com/httpwg/http-extensions/issues/1833 to track
combining the sections.

As to the representation vs content topic, I'm not sure what to suggest.
IMO semantics is really the place for those definitions to exist and be
comprehensible, and we cite those appropriately. RFC 3230 made great pains
to describe these concepts and it wound up outdated as the definition of
HTTP marched on. I'd like to avoid repeating that mistake. Maybe someone
has a smart, salient way to describe representation and content in this
draft in a way we don't already cover?

§3: I am probably missing the reference, but where is the `1#...` syntax
> for the ABNF defined, and does that translate to the comma-separated list
> format?
>

The #rule is is imported in the notational conventions from Section 5.6.1
<https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19#section-5.6.1>
of semantics.

§3: "Incremental digest algorithm" is mentioned with no additional
> information or explanation as to what that would be or how it would work.
>
> §4: "Incremental digest algorithm" is mentioned with no additional
> information or explanation as to what that would be or how it would work.
>

I agree and made https://github.com/httpwg/http-extensions/issues/1834 to
track these.


>
> §5: It's unclear what "Want-Digest" and "Want-Content-Digest"  mean in the
> context of a response. Does that mean the server wants a digest on one or
> more future requests? (That seems the most reasonable). The relevant text
> we have for "Accept-Signature" is currently the following (this itself
> probably needs additional work):
>
> When the Accept-Signature field is used in an HTTP Response message, the
> field indicates that the server desires the client to sign its next request
> to the server with the identified parameters, and the target message is the
> client's next request. The client can choose to also continue signing
> future requests to the same server in the same way.
>

We currently say "sender's desire to receive a representation digest on
messages associated with the request URI and representation metadata",
maybe we should make that clearer for requests and responses, considering
what you proposed. See https://github.com/httpwg/http-extensions/issues/1846
.


> §5: This doesn't really say what the receiver of the Want- fields has to
> do. If there's no obligation to fulfill it (which I think is the case), we
> should state that explicitly here.
>

This is also sort of underdefined in RFC 3230 and we've carried that
across. The Digest and Content-Digest sections state that a sender MAY send
anything. Additionally stating that there is no obligation to follow Want-
fields SGTM. See https://github.com/httpwg/http-extensions/issues/1835

§6: as stated above, I think we could address most of the structured field
> discussions here. For instance, the encoding claims a "quoted string" but
> could instead just declare a "binary field" or "structured field item"
> here. None of the existing registrations use quoted strings, I might add,
> so this is in line with that.
>

The sentence you're referring to in full is

" The associated encoding for new digest-algorithms MUST either be
represented as a quoted string or MUST NOT include ";" or "," in the
character sets used for the encoding"

This is very close to what RFC 3230 said. Which (arguably) is a roundabout
way of saying "do whatever you like but if the resultant value contains a
comma or semicolon it could screw up the Digest header parser, because
those characters are special. Therefore, if you really want a comma or
semicolon in your encoded output character set, you MUST serialize that as
a quoted string".

The registered algorithms to date don't have that problem, so they don't
need to be quoted. If some new algorithm came along that wanted to encode
the value like an sf-binary (base64 bookended by colons) then that's
totally fine and it would not need to be a quoted string.


> §6: Some of the existing algorithm descriptions should be toned down.
> Specifically, CRC32c claiming "better hamming distance" and talking about
> its uses is completely out of place in this registry.
>

These are verbatim what is already in the IANA table at
https://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml.
Personally I would be in favor of reducing the descriptions down to include
purely what the algorithm is and how the output is encoded for use in
Digest. What do others think? See
https://github.com/httpwg/http-extensions/issues/1836

§7: This section has me baffled, and I'm not sure what it's saying or where
> I'm getting lost. After a couple reads I've almost convinced myself that
> it's saying that the Digest header in a request would actually be a digest
> of the **resource that is being requested**. Is that the intent? Because if
> so, that seems completely insane to me, and I wouldn't know how to process
> that, so I hope I'm wrong. My expectation with requests, including all
> state-changing requests, is that it's the message content of **the request
> itself** that is hashed for the digest. So if I do a POST, I digest the
> body and add the header. If I do a PUT, same deal. Like in the example, if
> I do a PATCH, I digest only the part that I send, not the "whole" thing
> that's being represented -- or is that only for Content-Digest? I'm
> honestly not sure. But for GET, I would expect that Digest would not be
> allowed because there's no body to digest.
>

I'll defer to Roberto for a response here.

§8: Additional security consideration: Hash collision. This is an important
> consideration in choosing acceptable algorithms for an application.
>

Agreed, see https://github.com/httpwg/http-extensions/issues/1837. If you
have any suggestion for the text you'd like to see your input would be
appreciated.

§8: Additional security consideration: Bad encoding to break parsers. For
> example, sending bad base64 in an attempt to break or crash the parser,
> which is needed to validate the digest.
>

Interesting. Can I ask a bit more about the threat model you have in mind
here? Do you have any examples of such attacks? I'm not opposed to
addressing genuine concerns but I don't fully understand the scope of the
problem in relation to Digest headers. To put it another way, what is it
about Digest that makes header parsing failure worse than any other header
field?

§8.1: It's not just accidental corruption, but also malicious changes. The
> important part here is that it only protects the message body and not the
> rest.
>

Agreed. See https://github.com/httpwg/http-extensions/issues/1838

§8.2: I don't see how this is a security consideration except for the last
> line, which is mostly punted to §8.3. Consider rewriting this portion to be
> about the actual security gap, and not a list of the spec's virtues and
> applications. This could probably be covered by §8.1 and §8.3 anyway, but
> if there's enough to say in its own section, then focus on what's different
> and discussed here.
>
> §8.3: Surprised not to see a reference to httpbis-message-signatures in
> here. :)
>
> §8.3: You SHOULD NOT use normative requirements in the security
> considerations. The considerations sections are about providing
> explanations, examples, and workarounds -- things to think about during
> deployment. Actual requirements belong in the main document, and so these
> should be extracted to the appropriate sections. The first line needs to be
> its own security consideration referencing BCP19 for TLS usage, the second
> can go into §6 probably, with back and forward references.
>

The three points are good, we can definitely do some editorial work to
improve things. First though, the question I think the WG can help us
answer now is do we need to say anything in detail about Signatures? I
fully appreciate there is a strong relationship between the two but it
seems that Signatures should state how Digest is used, not the other way
around. I propose we would be fine with simply highlighting the limitations
of digest (it doesn't cover HTTP fields) and then noting that there's a way
to encode the digest of an empty representation that can be used by
signature to protect from some types of manipulation.

On the requirement we currently have - "Digest fields SHOULD always be used
over a connection that provides integrity at the transport layer that
protects HTTP fields." - I'm less certain this makes sense to retain. There
seems nothing unique in Digest that requires this property when compared to
other HTTP fields. There's an argument to treat everything as an untrusted
input, making some assumptions based on the transport layer seems like a
bad recommendation the more I think about it. Maybe someone has a different
opinion?

§8.5: The fact that you could digest an unencrypted payload then send it
> encrypted genuinely surprised me -- I would not have thought that possible
> within the content-encoding mechanisms of HTTP. I think this is going to be
> a combination that leads to many errors.
>

I think this consideration was mainly in relation to  RFC 8188 (Encrypted
Content-Encoding) the "id-sha-*" algorithms that were added but removed
just prior to WGLC. I suspect now, calculation the digest of plaintext and
sticking it in Digest along with Content-Encoding: aes128gcm would be
broken or dumb. Adding text to describe all the way things could be broken
is not great, we would be better removing this section IMO. See
https://github.com/httpwg/http-extensions/issues/1839

§8.6: This draft's algorithm agility effectively allows an attacker to
> choose and substitute the algorithm applied to the message, which could
> allow the attacker to choose a weaker algorithm and apply a collision hash
> to a request. Signing the digest headers, with things like message
> signatures, mitigates this by fixing the values in the digest header. Also,
> a recipient should have a limited set of algorithms that it should accept
> for a given application, and not just take whatever the message creator (or
> attacker) sends it.
>

 Thanks. See https://github.com/httpwg/http-extensions/issues/1841


> §8.7: Structured field dictionaries disallow the same key in a dictionary,
> so couldn't we just say that repeating a key is simply an error? I know
> 3230 doesn't say that it's prohibited, but I also don't see where it says
> that it's allowed, so I would rather this new document take the more strict
> interpretation. I can't think of a good security or application reason to
> allow this, even if it's application-dependent.
>

The list syntax permits this. If it wasn't an error before, it's not really
ok to say it's an error now. There's no way to negotiate "which version" of
Digest is in play.


> §XX: The document is missing privacy considerations. This RFC provides
> good guidelines for writing them:
> https://datatracker.ietf.org/doc/html/rfc6973
>

Do you have any specific concerns for the fields defined in this document
that you feel need to be addressed by privacy considerations? The last 7
RFCs that this group produced did not include a privacy considerations
section. I'd be more than happy to include one if we need it but as far as
my assessment goes there's nothing special going on here. But that's just
my assessment, if others have concerns please raise them.

§9.2: Is normative text allowed in such a description here?
>

I don't think we can make an IANA table entry normative for anything :-)
Tracked in https://github.com/httpwg/http-extensions/issues/1842

§A: This section is missing examples of how the Digest and Content-Digest
> headers would be applied to each of these messages. I know that's also the
> goal of the next section, but it seems out of place separated from each
> other. I also realize that the entire point of having something like §A is
> because of people like me who still have problems wrapping their heads
> around this topic. :)
>

I'm 50/50 on this. I can see why it might be nice but it risks overloading
the reader with concepts when the purpose of the section is focus on
representation data. Then again, it might help illustrate the differences
between Digest and Content-Digest. A candidate PR might help the debate.
Tracked in https://github.com/httpwg/http-extensions/issues/1843

§B: I'd like to see examples that are doing things other than REST-style
> resource management. APIs have a real need for having the client digest a
> POSTed JSON object, for example, and getting back a completely different
> JSON object, which may or may not have anything to do with the request body
> in structure or form. For example, in GNAP, you POST a grant request and
> get back a grant response, and neither of these are "resources" or "files"
> in the traditional HTTP or REST model. This is overwhelmingly common and
> I'd like to see more examples of what that looks like here.
>

For me, the important part of the spec is defining clearly how it relates
to content and representation. Implementations will need to apply that to
various permutations of using HTTP. The examples are there to illustrate
different applications of Digest, or specific interactions that might not
be totally obvious.

In appendix B.6 we show a client sending a JSON object and a server
responding with a brotli encoded JSON. What's inside the JSON here isn't
really on note, the important thing is how Digest is calculated. It is not
clear to me how your example of GNAP is much different from B.6 (but
correct me if I'm wrong). From a cursory glance, it's just POST method and
request and response content of JSON.


> §C: These examples should say what the recipient of the message should do,
> even if it's "we don't say what they should do". In §C.2 for example,
> should the client ask again, can it throw its own error, what?
>

Why? These examples illustrate the basic interactions. Anything else is an
implementation decision, so could be misinterpreted as a suggestion.

§C.3: This is an interesting example where the digest on the response (were
> it included) would be of the error message, not the "resource" that was
> requested. Or at least, that's what I'd expect it to be here. Is that true?
>

See B.10 that deals with this.


> Nits:
>
> Throughout document, "checksum" and "digest" are used interchangeably, is
> that intentional? It would probably be better to pick one, and I suggest
> "digest".
>

Tracked in https://github.com/httpwg/http-extensions/issues/1844


> §9: I'd personally list the header field registrations in the order:
> Digest, Content Digest, Want-Digest, Want-Content-Digest, as they show up
> in the document in that order.
>

Easy fix. Tracked in  https://github.com/httpwg/http-extensions/issues/1845


> ---
>
>
> While this is a long review, I honestly do think the document is overall
> in good shape and hopefully these can all be addressed pretty easily.
>

There's many useful points here, thanks for taking the time!

Cheers,
Lucas

[1] - https://github.com/httpwg/http-extensions/issues/1671
[2] - https://www.rfc-editor.org/rfc/rfc8941.html#name-byte-sequences
[3] -
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-semantics-19#section-14.2
[4] -
https://www.ietf.org/archive/id/draft-nottingham-http-structure-retrofit-00.html
Received on Saturday, 11 December 2021 03:12:45 UTC