Re: Working Group Last Call: draft-ietf-httpbis-message-signatures-13 from Julian Reschke on 2022-10-28 (ietf-http-wg@w3.org from October to December 2022)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 28 Oct 2022 16:17:58 +0200
To: ietf-http-wg@w3.org
Message-ID: <94045f6e-9286-059e-f5f9-734bf3a9c419@gmx.de>

On 27.10.2022 16:33, Justin Richer wrote:
> Hi Julian,
>
> Thanks for bringing this up. When the authors discussed both the @path and @query derived components, we struggled a bit to come up with the best way to define a canonicalized form, since both can use percent-encoding in the wild. While for the vast majority of applications it’s going to be “take whatever string is handed to me by calling getPath() and getQuery()”, and that’s going to work, we obviously need to have something more precise here for all the corner cases.
>
> At the time of first discussion, the best advice seemed to be to account for the percent-encoding on both of these, since that’s what most libraries seemed to do automatically behind the scenes. The language that we have was intended to reflect that, but we will absolutely defer to others in the group if there’s a better way to describe this.
>
> I agree that we should have some examples that reflect any allowable transformations, too. We’ve got an appendix for showing these kinds of things that would be a great place to showcase this, with a forward reference from the @path and @query sections.
>
> I’m interested to hear what others think the best approach would be here.
>
> Thanks,
>   — Justin

Well, this is tricky.

If you normalize by unescaping, you assume that any usage of
percent-encoding can be removed without losing information. IMHO that is
no the case.

For the path example: a path segment "a%2fb" in general is different
from the "a/b"; minimally in how it will behave in a conforming URI
implementation doing reference resolution.

Maybe the problematic part here is:

"The value is normalized according to the rules in [HTTP], Section
4.2.3. Namely, an empty path string is normalized as a single slash /
character, and path components are represented by their values after
decoding any percent-encoded octets."

But that's not what HTTP, Section 4.2.3 says:

"Characters other than those in the "reserved" set are equivalent to
their percent-encoded octets: the normal form is to not encode them (see
Sections 2.1 and 2.2 of [URI])."

So percent-unescaping is only "safe" when restricted to characters that
are not in the "reserved" set (and "/" is in the reserved set).

Best regards, Julian

Received on Friday, 28 October 2022 14:18:14 UTC