Re: Paul Wouters' Discuss on draft-ietf-httpbis-digest-headers-12: (with DISCUSS and COMMENT) from Lucas Pardue on 2023-05-24 (ietf-http-wg@w3.org from April to June 2023)

From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Wed, 24 May 2023 20:02:25 +0100
To: Paul Wouters <paul.wouters@aiven.io>
Cc: The IESG <iesg@ietf.org>, draft-ietf-httpbis-digest-headers@ietf.org, httpbis-chairs@ietf.org, ietf-http-wg@w3.org, mnot@mnot.net
Message-ID: <CALGR9oYU9_g2t7gFC78stSZ2eye=Ujz9X+E4n3bCOOwh9TEHsA@mail.gmail.com>
Hi Paul,

Thank you for your review. I'll make a first pass at responding to
comments, I suspect we'll need a few more rounds to resolve a few of the
bigger ones, especially the DISCUSS item. See responses in-line

On Wed, May 24, 2023 at 6:17 PM Paul Wouters via Datatracker <
noreply@ietf.org> wrote:

> Paul Wouters has entered the following ballot position for
> draft-ietf-httpbis-digest-headers-12: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to
> https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/
> for more information about how to handle DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-httpbis-digest-headers/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> I have a few DISCUSS questions about some of the security aspects of the
> draft,
> that I would like the authors to at least consider.
>
> It seems this new registry is getting pre-populated with a _lot_ of
> "insecure"
> variants. Is there a strong reason why to not only add 1 insecure entry
> (eg crc32c) ?
> New RFCs really should not be adding anything using MD5 or SHA1, even if we
> allow/accept it is used insecure (see below for details).
>

Earlier drafts of the document  did not support weaker algorithms like
crc32 or md5. We received feedback from the community is that apply such
restrictions would be problematic for some. For instance, they have large
archives where resources already have associated digests that use an
algorithm that would have been prohibited. Recalculating those digests with
new algorithms would be a significant operational overhead that would be
hard to justify, especially given that their threat model already factors
that these algorithms are weak and there hasn't been any other reason to
change.

The intention of this specification is to improve upon the ambiguity of RFC
3230, which presents it own type of security problems when people can't
agree on the input bytes or the HTTP semantic properties. Being overly
conservative about the algorithms we support in a document that replaces
RFC 3230 risks preventing legacy applications from migrating to modern
standards of HTTP. Therefore the WG reached consensus on supporting all of
the existing registered algorithms for RFC 3230 by marking known-insecure
algorithms as "insecure" in order to highlight them and nudge new use cases
in a better direction.

The important part for us is that we can ensure interoperabilty of
endpoints with respect to the inputs and outputs. Cataloging the algorithm
indentifiers and their reference specifications helps that. The usefulness
of HTTP digests depends on applications and their threat model. Additional
application profiles are free to place additional constraints on
algorithms. Roman suggested some new text to make that explicit and we've
got a WIP change to do that.



> The other question on the IANA Registry I have is the format and
> registration
> policy. Recently, most security related IANA Registries try to use a
> RECOMMENDED
> column that can only be set to Y using a registration policy of RFC
> Required.
> Any other registration (eg via specification required or FCFS) cannot get
> RECOMMENDED Y. Also, to change the RECOMMENDED column requires a standards
> track
> RFC (via RFC Required policy). Is there a reason we cannot do the same
> here? I
> also think a Designated Expert should not be able to change what is
> "standard"
> or not, as the word "standard" strongly implies "standard track" or at the
> very
> least "IETF consensus" which is not the same as a DE making a decision on
> their own.
>
> This would also resolve my issue of specifying things as "standard" (when
> it
> didn't come in via standards track RFC) and "insecure" (which really seems
> to
> mean "secure for this type of usage, but not that type of usage), so I
> would
> strongly prefer this indirection to be removed, and state a usage of
> "checksum
> only" and "signed hash".
>

This was debated somewhat. IIRC correctly one of the pieces of feedback was
around the difficulty in assigning or interpreting what a RECOMMENDED means
in the context of digests.

The document is careful to leave application or local policy choices to
implementations, or in other words tries to maintain a neutral position on
use cases. The challenge with RECOMMENDED is that it ascribes a value
judgement for use cases of digests that we do not have a good ability to
assess. In effect, that places _somebody_ in charge of subjective
determination. And as we found during the process, its very tough to make
that determination.

The use of labels is intended to remove the subjectivity and present
objective data. I agree that "standard" might be confused with IETF
standards. We say the label "should reflect
standardization status and the broad opinion of relevant interest groups
such as the IETF or security-related SDOs. The "standard" status is not
suitable for an algorithm that is known to be weak, broken, or
experimental".". Can you suggest a different label value that could
accommodate this without conflating with standards or recommendations?

I'm not understanding what the labels "checksum only" and "signed hash"
mean or how they would be used in the context of these fields. Can you
elaborate?


> Finally, I think some more careful writing is needed around the case of
> integrity vs integrity+signature and what it protects against. I don't
> think "unintended or malicious data corruption" should be used as a type.
> Either talk about "unintended" or talk about "malicious" - the two cannot
> be used within a single concept unless you mandate integrity+signature.
>

I agree there's some difference. See more detailed response to your COMMENT
below.


> If multiple hashes are included, what should happen when the most
> preferred hash
> failed to verify? Should it be treated as failed or should it try other
> less
> preferred but accepted hash algorithms? There should be text clarifying the
> behaviour. Personally, I would prefer that only the most preferred hash is
> checked and the other hashes are ignored, but perhaps I'm not fully aware
> of
> common operational issues.
>

Also discussed as part of Roman's review. The decision would be an
application or local policy decision. We have an issue (
https://github.com/httpwg/http-extensions/issues/2557) and PR (
https://github.com/httpwg/http-extensions/pull/2558) to address this and
welcome more input to make it clear what this spec does and doesn't say.



>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
>
>         In between connections there is a possibility for unintended
>         or malicious data corruption. An HTTP integrity mechanism can
>         provide the means for endpoints, or applications using HTTP,
>         to detect data corruption and make a choice about how to act on
>         it. An example use case is to aid fault detection and diagnosis
>         across system boundaries.
>
> I think there is too much subtlety here between starting with "unintended
> or malicious data corruption" and then follow that with "detect
> data corruption".  It is clear that a hash can detect "unintended"
> data corruption, but it is less clear it can detect "malicious" data
> corruption, as a malicious actor will just update the digest value. And
> the text is inconsistent with the Security Consideration in Section 6.1:
>
>         Integrity fields are not intended to be a general protection
>         against malicious tampering with HTTP messages.
>

Agree. I think the simplest thing to do here is just remove the "unintended
or malicious" qualifier. Corruption is corruption. Other techniques can be
used alongside or on top of digests to try and identify the cause.


>
>         Integrity fields do not provide integrity for [...] fields.
>

> Perhaps "integrity field" then deservers a better name? I would propose
> "digest field" but I guess RFC 3230 took that one and this document is
> trying to obsolete that. Why not "checksum field" or "hash field" ?
>

I'm happy with the current terminology and it has consensus in the HTTP WG.


> The short fix for the sentence above could be:
>
>         Integrity fields do not provide authenticated integrity for [...]
>

HTTP messages are composed of fields (metadata) and content. The intro
section states that Content-Digest and Repr-Digest apply to content or
representations. The sentence is just making it abundantly clear that they
don't apply to metadata or a message as a whole. Adding the word
"authenticated" into this sentence would be confusing. What is the
suggstion inspired by? Maybe there is something else we can address in a
different way.


>
>
>         Requests to update or change the fields in an existing
>         registration are permitted. For example, this could allow for the
>         transition of an algorithm status from "standard" to "insecure"
>         as the security environment evolves.
>
> As stated in my DISCUSS, it feels odd that a DE can change what "standard"
> means.
> And as stated in my DISCUSS, the term "insecure" feels weird to me.
>

Hope to resolve that as part of DISCUSS remediation.


>         Integrity fields do not provide any mitigations for downgrade
>         or substitution attacks (see Section 1 of [RFC6211]) of the
>         hashing algorithm.
>
> See also my DISCUSS. While one hash field does not, sending multiple
> ones could defend it a bit.  If a sha: and sha-256: digest is sent,
> and the receiver support sha-256, and the sha-256 hash is wrong but the
> sha hash is fine, it should consider the integrity broken (assuming it
> deems sha-256 more secure than sha-1).  However, such behaviour should
> be defined in this document (eg what to do in the case of multiple
> hashes. eg. Only pick up the most preferred one and ignore the rest,
> so in the above example one wouldn't even look at the sha: hash at all
> if the sha-256 hash failed).
>

Related to my earlier response, the fields themselves just carry a set of
digests. They can't provide much protection themselves. That would need to
be done via other means, such as an application profile that describes
rules for validation. If it would help, we can also add a sentence here to
point that out explicity (along the lines of
https://github.com/httpwg/http-extensions/pull/2558)


>
> Section 7.2:
>
> Can we omit all the "insecure" entries from the registry ?  This draft is
> something new, and whomever implements this should have at this point
> sha-256 support available. Is there a performance issue? If there is a need
> for something insecure and quick, can we limit this to 1 entry (eg crc32c)
> and especially avoid MD5/SHA1 due to complications of those functions
> getting blocked in crypto libraries and system wide crypto policies ?
>

See earlier response.


>
>         Also, the Content-MD5 is limited to one specific digest algorithm;
>         other algorithms, such as SHA-1 (Secure Hash Standard), may be
> more appropriate
>
> Is there a reason not to use SHA-256 instead of SHA-1 in this example?
> The use of SHA-1 (or MD5) would be good enough for the purpose of this
> draft (ignoring signed hashes) but the MD5 or SHA-1 hash function might
> be disallowed or removed from crypto library implementations. If the
> difference in performance between SHA-1 and SHA-256 is not an issue, I
> would like to see SHA-256 mentioned instead of SHA-1. (and as separate
> issue, I would prefer not recommend or even define MD5 or SHA-1 in this
> new Registry at all)
>

I'm confused. The text you've quoted only appears in RFC 3230. All examples
of HTTP messages use either sha-256 or sha-512.


>         whereas hashing algorithm keys are quoted
>
> "hashing algorithm keys" is a strange term to me. It becomes clear a
> few sentences down when we are talking about key/value pairs. Maybe
> use "key value" and "hash value" (or qvalue as is used now?) when talking
> about "keys" and "values"?
>

Noted. I can make this text clearer by tweaking the notational conventions
a little, that will be a good improvement. However, the suggestion risks
confusion with Structured Fields terminology so I'll take a different
approach.


> Section 4:
>
> Why does Want-Repr-Digest: and Want-Content-Digest: need to have weights?
> Are weight values used for anything else but a preference list? Eg why not
> define Want-Repr-Digest: to have its most preferred algo listed first
> (left).
>
> So instead of:
>
> Want-Repr-Digest: sha-512=3, sha-256=10, unixsum=0
>
> Why not:
>
> Want-Repr-Digest: sha-512, sha-256, unixsum
> (or even leave out unixsum is not supported or wanted)
>
> The only argument I can see for weights is if you want to define multiple
> hash algorithms with the exact same weight, which a left to right notation
> wouldn't allow you do. But is that a feature that is really needed?
>

We're maintaining the semantics of the Want-Digest field from RFC 3230.
Also weights are a common approach in HTTP. I don't see any need to change
the design of these fields at this late stage.

Once again, thanks for the review.

Cheers,
Lucas
Received on Wednesday, 24 May 2023 19:02:43 UTC