Dnsdir telechat review of draft-ietf-httpbis-rfc6265bis-19 from Petr Špaček via Datatracker on 2025-02-17 (ietf-http-wg@w3.org from January to March 2025)

From: Petr Špaček via Datatracker <noreply@ietf.org>
Date: Mon, 17 Feb 2025 08:12:56 -0800
To: <dnsdir@ietf.org>
Cc: draft-ietf-httpbis-rfc6265bis.all@ietf.org, ietf-http-wg@w3.org, last-call@ietf.orgm, pspacek@isc.org
Message-ID: <173980877687.1530167.1520003813683583144@dt-datatracker-75c44cbbdf-pxnd6>
Reviewer: Petr Špaček
Review result: Almost Ready

I was assigned as the dnsdir reviewer for draft-ietf-httpbis-rfc6265bis-19.
For more information about the DNS Directorate, please see
https://wiki.ietf.org/en/group/dnsdir

The primary problem of this draft is inconsistent handling of host/name
identifiers with implicit assumptions. Instructions for handling domain names
between the lines assume the underlying technology is DNS. Consequently, the
protocol as specified does not work for alternative naming systems like mDNS
(RFC 6762) for reasons described in RFC 8222. Please note: mDNS is just an
example. RFC 6055 page 7 gives more examples.

The same underlying problem seems to be present in wider HTTP spec ecosystem:
- [FETCH] spec section 2.5 states DNS is only one example of a naming system
- ORIGIN] talks only about DNS
- [URL] hardcodes rules for DNS and IDNA "compatibility" as currently employed
by DNS (but not mDNS). A trivial test which demonstrates this problem in
practice with "curl" was already posted to
https://lists.w3.org/Archives/Public/ietf-http-wg/2025JanMar/0136.html

Considering the prevalence of this problem in the HTTP specs, I'm not against
keeping the statut quo if authors decide to do so, but I think it should be
acknowledged at the beginning of the document.

Suggested course of action: Remove most of name specifications and replacing it
with an established spec, e.g. [URL], to avoid yet another fracture in the
ecosystem.

More specific comments follow.

[FETCH] https://fetch.spec.whatwg.org/#resolving-domains
[ORIGIN] RFC 6454 sec 8.1
[URL] https://url.spec.whatwg.org/#host-representation ;
https://url.spec.whatwg.org/#host-writing

> 2.3. Terminology
> Whenever possible, user agents SHOULD use an up-to-date public suffix list,
such as the one maintained by the Mozilla project at [PSL].

A normative SHOULD in Terminology section seems odd to me. I recommend moving
this into Security Considerations, section 8.7. Possibly with a note that
failure to update might have security implications.

> 4.1.1. Syntax
> The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as
enhanced by [RFC1123], Section 2.1. Thus, domain-value is a string of [USASCII]
characters, such as an "A-label" as defined in Section 2.3.2.1 of [RFC5890].

This might work if we assume the underlying naming system is DNS. It would not
work for mDNS (with bare UTF-8) or possibly others. But see above, it might be
good enough.

> 4.1.2.3. The Domain Attribute
> (Note that a leading %x2E ("."), if present, is ignored even though that
character is not permitted.)

Should this be mentioned in the 4.1.1. Syntax? This inconsistency makes me
wince.

> 5.1.2. Canonicalized Host Names
> A canonicalized host name is the string generated by the following algorithm:
>
> 1. Convert the host name to a sequence of individual domain name labels.
>
> 2. Convert each label that is not a Non-Reserved LDH (NR-LDH) label, to an
A-label (see Section 2.3.2.1 of [RFC5890] for the former and latter). > > 3.
Concatenate the resulting labels, separated by a %x2E (".") character.

This algorithm does not handle all possible inputs. I suggest removing it from
specification altogether and reference e.g. [URL] spec section 3.4 which goes
into more details how to transform names, or some other suitable place in the
[URL] spec.

In case a reference to the [URL] spec is not viable, here are problems I can
see. Using teminology from RFC 5890 sec. 2.3.1: DNS name (RFC 1035) > LDH host
name (RFC 1123) > R-LDH Label (RFC5890) > XN-label > Fake A-label vs. A-label

According to diagram in RFC 5860 page 10, this algorithm underspecifies what to
do with Non-LDH labels and also with R-LDH labels which are not XN-labels, or
Unicode labels which cannot be converted to A-labels. Also, nobody knows what
semantics can be assigned to future non-XN labels so I would be very cautions -
because output of this algorithm is used for security decisions later on.

Having said that, I think replacing content of this section with a reference to
[URL] spec + adding hard failure cases when something unexpected happens is
better.

> 5.1.3. Domain Matching

a. I guess it's missing a preamble like: This operation MUST be done only on
canonicalized host names (or something to that effect).

b. Given the algorithm in 5.2.1 requires splitting the name to labels, and the
[URL] spec does the same by referencing Unicode TR#46, I suggest to rework this
algorithm to compare _individual labels_ from right to left instead of
reassembling a string and doing string comparisons. The lesson from DNS world
is that domains are all but strings, and manipulating them like strings is
often unreliable in weird ways. Alternatively reference some other HTTP spec
which deals with this (if there is any, I know cookies are a bit special).

> 5.6.3. The Domain Attribute
> If cookie-domain starts with %x2E ("."), let cookie-domain be cookie-domain
without its leading %x2E (".").

Input handling in section 5.6.3 makes me uneasy. The preamble of section 5.6
explicitly states weird inputs are to be expected (at least) via non-HTTP APIs,
and the generic algorithm removes only the control characters. E.g. "!" or "~"
are allowed in 5.6 and these characters also pass the filter in 5.7 (because
they are part of USASCII).

Perhaps this would be a good place to invoke [URL] host parser to validate what
we have received from the network?

> 5.7. Storage Model
> Step 8: If the domain-attribute contains a character that is not in the range
of [USASCII] characters, abort these steps and ignore the cookie entirely. This
allows weird stuff to get in, like "~bla!.example.com" which is all within
ASCII range. Good basic sanity check, but insufficient by itself. See note
above in 5.6.3 and [URL] parser.

Step 9: I guess PSL should be checked only after canonicalization and other
sanity checks. Suggestion: Exchange steps 9 and 10 (with appropriate
modifications).

Step 10: request-host value is canonicalized, but the domain-attribute value is
NOT canonicalized here. Is that intentional? I gather domain-attribute value
should have been canonicalized by sender, but I think it would be more robust
to canonicalize both before manipulating it further.

> If the canonicalized request-host does not domain-match the domain-attribute:

I would add reference for "domain-match" definition in sec. 5.1.3.

> 5.8.3. Retrieval Algorithm
> Let cookie-list be the set of cookies from the cookie store that meets all of
the following requirements: > > Either: > > The cookie's host-only-flag is true
and the canonicalized host of the retrieval's URI is identical to the cookie's
domain. > Or: > > The cookie's host-only-flag is false and the canonicalized
host of the retrieval's URI domain-matches the cookie's domain.

Sections 5.7 Storage Model and 5.8 Retrieval Model sort of ignore the role of
'generator', i.e. the server which needs to properly form cookies. Perhaps it
is okay, but it has surprised me. In DNS spec we often have 'server' and
'client' parts in the spec, but here we seem to have only 'client'.

> 8.7. Reliance on DNS

This is first and only mention of 'DNS' in the text and it is inaccurate. This
draft relies on security of generic name resolution services - it might as well
be Tor onion name or anything else (with especially bad consequences for Tor
users). I suggest generalizing this section. If an examples are needed to
clarify 'name resolution services' then a reference to RFC 6055 page 7 might be
handy.

--
Petr Špaček
Received on Monday, 17 February 2025 16:13:02 UTC