- From: Petr Špaček via Datatracker <noreply@ietf.org>
- Date: Mon, 17 Feb 2025 08:12:56 -0800
- To: <dnsdir@ietf.org>
- Cc: draft-ietf-httpbis-rfc6265bis.all@ietf.org, ietf-http-wg@w3.org, last-call@ietf.orgm, pspacek@isc.org
Reviewer: Petr Špaček Review result: Almost Ready I was assigned as the dnsdir reviewer for draft-ietf-httpbis-rfc6265bis-19. For more information about the DNS Directorate, please see https://wiki.ietf.org/en/group/dnsdir The primary problem of this draft is inconsistent handling of host/name identifiers with implicit assumptions. Instructions for handling domain names between the lines assume the underlying technology is DNS. Consequently, the protocol as specified does not work for alternative naming systems like mDNS (RFC 6762) for reasons described in RFC 8222. Please note: mDNS is just an example. RFC 6055 page 7 gives more examples. The same underlying problem seems to be present in wider HTTP spec ecosystem: - [FETCH] spec section 2.5 states DNS is only one example of a naming system - ORIGIN] talks only about DNS - [URL] hardcodes rules for DNS and IDNA "compatibility" as currently employed by DNS (but not mDNS). A trivial test which demonstrates this problem in practice with "curl" was already posted to https://lists.w3.org/Archives/Public/ietf-http-wg/2025JanMar/0136.html Considering the prevalence of this problem in the HTTP specs, I'm not against keeping the statut quo if authors decide to do so, but I think it should be acknowledged at the beginning of the document. Suggested course of action: Remove most of name specifications and replacing it with an established spec, e.g. [URL], to avoid yet another fracture in the ecosystem. More specific comments follow. [FETCH] https://fetch.spec.whatwg.org/#resolving-domains [ORIGIN] RFC 6454 sec 8.1 [URL] https://url.spec.whatwg.org/#host-representation ; https://url.spec.whatwg.org/#host-writing > 2.3. Terminology > Whenever possible, user agents SHOULD use an up-to-date public suffix list, such as the one maintained by the Mozilla project at [PSL]. A normative SHOULD in Terminology section seems odd to me. I recommend moving this into Security Considerations, section 8.7. Possibly with a note that failure to update might have security implications. > 4.1.1. Syntax > The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as enhanced by [RFC1123], Section 2.1. Thus, domain-value is a string of [USASCII] characters, such as an "A-label" as defined in Section 2.3.2.1 of [RFC5890]. This might work if we assume the underlying naming system is DNS. It would not work for mDNS (with bare UTF-8) or possibly others. But see above, it might be good enough. > 4.1.2.3. The Domain Attribute > (Note that a leading %x2E ("."), if present, is ignored even though that character is not permitted.) Should this be mentioned in the 4.1.1. Syntax? This inconsistency makes me wince. > 5.1.2. Canonicalized Host Names > A canonicalized host name is the string generated by the following algorithm: > > 1. Convert the host name to a sequence of individual domain name labels. > > 2. Convert each label that is not a Non-Reserved LDH (NR-LDH) label, to an A-label (see Section 2.3.2.1 of [RFC5890] for the former and latter). > > 3. Concatenate the resulting labels, separated by a %x2E (".") character. This algorithm does not handle all possible inputs. I suggest removing it from specification altogether and reference e.g. [URL] spec section 3.4 which goes into more details how to transform names, or some other suitable place in the [URL] spec. In case a reference to the [URL] spec is not viable, here are problems I can see. Using teminology from RFC 5890 sec. 2.3.1: DNS name (RFC 1035) > LDH host name (RFC 1123) > R-LDH Label (RFC5890) > XN-label > Fake A-label vs. A-label According to diagram in RFC 5860 page 10, this algorithm underspecifies what to do with Non-LDH labels and also with R-LDH labels which are not XN-labels, or Unicode labels which cannot be converted to A-labels. Also, nobody knows what semantics can be assigned to future non-XN labels so I would be very cautions - because output of this algorithm is used for security decisions later on. Having said that, I think replacing content of this section with a reference to [URL] spec + adding hard failure cases when something unexpected happens is better. > 5.1.3. Domain Matching a. I guess it's missing a preamble like: This operation MUST be done only on canonicalized host names (or something to that effect). b. Given the algorithm in 5.2.1 requires splitting the name to labels, and the [URL] spec does the same by referencing Unicode TR#46, I suggest to rework this algorithm to compare _individual labels_ from right to left instead of reassembling a string and doing string comparisons. The lesson from DNS world is that domains are all but strings, and manipulating them like strings is often unreliable in weird ways. Alternatively reference some other HTTP spec which deals with this (if there is any, I know cookies are a bit special). > 5.6.3. The Domain Attribute > If cookie-domain starts with %x2E ("."), let cookie-domain be cookie-domain without its leading %x2E ("."). Input handling in section 5.6.3 makes me uneasy. The preamble of section 5.6 explicitly states weird inputs are to be expected (at least) via non-HTTP APIs, and the generic algorithm removes only the control characters. E.g. "!" or "~" are allowed in 5.6 and these characters also pass the filter in 5.7 (because they are part of USASCII). Perhaps this would be a good place to invoke [URL] host parser to validate what we have received from the network? > 5.7. Storage Model > Step 8: If the domain-attribute contains a character that is not in the range of [USASCII] characters, abort these steps and ignore the cookie entirely. This allows weird stuff to get in, like "~bla!.example.com" which is all within ASCII range. Good basic sanity check, but insufficient by itself. See note above in 5.6.3 and [URL] parser. Step 9: I guess PSL should be checked only after canonicalization and other sanity checks. Suggestion: Exchange steps 9 and 10 (with appropriate modifications). Step 10: request-host value is canonicalized, but the domain-attribute value is NOT canonicalized here. Is that intentional? I gather domain-attribute value should have been canonicalized by sender, but I think it would be more robust to canonicalize both before manipulating it further. > If the canonicalized request-host does not domain-match the domain-attribute: I would add reference for "domain-match" definition in sec. 5.1.3. > 5.8.3. Retrieval Algorithm > Let cookie-list be the set of cookies from the cookie store that meets all of the following requirements: > > Either: > > The cookie's host-only-flag is true and the canonicalized host of the retrieval's URI is identical to the cookie's domain. > Or: > > The cookie's host-only-flag is false and the canonicalized host of the retrieval's URI domain-matches the cookie's domain. Sections 5.7 Storage Model and 5.8 Retrieval Model sort of ignore the role of 'generator', i.e. the server which needs to properly form cookies. Perhaps it is okay, but it has surprised me. In DNS spec we often have 'server' and 'client' parts in the spec, but here we seem to have only 'client'. > 8.7. Reliance on DNS This is first and only mention of 'DNS' in the text and it is inaccurate. This draft relies on security of generic name resolution services - it might as well be Tor onion name or anything else (with especially bad consequences for Tor users). I suggest generalizing this section. If an examples are needed to clarify 'name resolution services' then a reference to RFC 6055 page 7 might be handy. -- Petr Špaček
Received on Monday, 17 February 2025 16:13:02 UTC