- From: Steven Bingler <bingler@chromium.org>
- Date: Fri, 14 Nov 2025 12:17:25 -0500
- To: Petr Špaček <pspacek@isc.org>
- Cc: dnsdir@ietf.org, draft-ietf-httpbis-rfc6265bis.all@ietf.org, ietf-http-wg@w3.org, last-call@ietf.org
Hi Petr,
Thanks for taking a look. I'll follow up once the PR is merged and,
unless you see another issue in the meantime, I'll consider this
review completed.
> I'm saying this just to make people aware there would be dragons if
> "Canonicalized Host Names" section was less strict.
Just to be clear, the current "Canonicalized Host Names" algorithm
doesn't have any issues?
Thanks,
- Steven
On Fri, Nov 14, 2025 at 5:53 AM Petr Špaček <pspacek@isc.org> wrote:
>
> Hi Steven,
>
> Thank you for the changes. I went through
> https://github.com/httpwg/http-extensions/pull/3327
> and it seems good to me. I've submitted couple wording nits into the PR.
>
> I could not break it, under the assumption that only permitted names are
> encoded into ASCII-only + the new restrictions added into "Canonicalized
> Host Names" section.
>
>
> Side note:
>
> The algorithm in
> ### Domain Matching
> is done exactly the opposite way than normal DNS name matching works.
>
> DNS software first breaks names into labels and then operates on
> individual labels, comparing them from most significant to least
> significant.
>
> The algorithm in the document instead constructs one string per full
> name and then compares the strings from right to left, which relies on
> lexical properties of allowed algorithm inputs to make it work.
>
> In case of this document this seems to work because "Canonicalized Host
> Names" section limits inputs and removes possibility of having "." as
> part of the label.
>
>
> FTR a name like this would be allowed by DNS itself:
>
> label\.with\.three\.dots.example.com
>
> Text representation above is written using RFC1035 escaping rules.
> On the wire it simply is 0x2e octet in the label octet sequence. Each
> label is prefixed by label-length field. I.e. labels can contain
> arbitrary binary garbage.
>
> I'm saying this just to make people aware there would be dragons if
> "Canonicalized Host Names" section was less strict.
>
> With that - I wish you smooth progress through publication process!
>
> Petr Špaček
>
>
>
>
> On 13. 11. 25 19:15, Steven Bingler wrote:
> > Hi Petr,
> >
> > Thanks for your responses, find my follow ups immediately below which
> > are then followed by further comments on your original review.
> >
> >> With my software developer hat on...
> >
> > After taking another look don't think Section 4.1.2.3 needs this note.
> > Section 4 is defining the well behaved server syntax so including a
> > note that bad behavior is accepted seems counterproductive.
> >
> > We've had a number of discussions in the past regarding the messiness
> > of the Section 4 (Server) and Section 5 (User Agent) behavior
> > differences and what we have now is the result of trying to clean up
> > what we can. It's an idiosyncrasy to be sure but likely one that is
> > going to take much more work in both the spec space and real world in
> > order to resolve.
> >
> >> Apologies, that was a typo. I meant RFC 5890 page 10
> >
> > Thanks for clarifying.
> >
> >> Perhaps add a sentence like "weird inputs will be rejected because they
> >> will not match" or something?
> >
> > I'm disinclined to add a note since 5.6.3 is only concerned with
> > getting the value of the `Domain=` attribute into the expected form.
> > It doesn't know or otherwise care about the value itself. Section 5.7
> > Step 10 is what actually examines the value
> > ```
> > 10 If the domain-attribute is non-empty:
> > 1. If the canonicalized request-host does not domain-match the
> > domain-attribute:
> > 1. Abort this algorithm and ignore the cookie entirely.
> > ```
> >
> >> Step 10: request-host value is canonicalized, but the domain-attribute value is
> >> NOT canonicalized here. Is that intentional?
> >
> > The producing server is expected to provide a domain value that
> > matches the request url as processed by the user agent (i.e.: Yes).
> >
> > While I wasn't around for the original decision I suspect that Unicode
> > was not considered at the time, meaning that the `Domain=` value
> > simply needed to match the site's domain.
> >
> >> See above about the difference between producer and consumer grammar.
> >
> > Ah. Yes, in short the spec advises a well behaved syntax that it
> > recommends server adhere to but, in the interest of wider
> > compatibility, the spec was resigned to include non-ideal behavior
> > that we've seen in the wild. E.x.: Historical behavior, common
> > mistakes, etc.
> >
> > --- Original review comments below ---
> >
> >> Considering the prevalence of this problem in the HTTP specs, I'm not against
> >> keeping the statut quo if authors decide to do so, but I think it should be
> >> acknowledged at the beginning of the document.
> >
> > I've added a new section at the beginning for how name resolution
> > should be handled. Namely, only ASCII and ACE is supported and the
> > document is written from the perspective of DNS.
> >
> >>> 2.3. Terminology
> >>> Whenever possible, user agents SHOULD use an up-to-date public suffix list,
> >>> such as the one maintained by the Mozilla project at [PSL].
> >
> > Agreed, created a new security section and referenced it from 2.3 instead.
> >
> >>> 4.1.1. Syntax
> >>> The domain-value is a subdomain as defined by [RFC1034], Section 3.5, and as
> >> enhanced by [RFC1123], Section 2.1. Thus, domain-value is a string of [USASCII]
> >> characters, such as an "A-label" as defined in Section 2.3.2.1 of [RFC5890].
> >
> >> This might work if we assume the underlying naming system is DNS...
> >
> > Should be covered by the new name resolution section.
> >
> >>> 5.1.2. Canonicalized Host Names
> >>> A canonicalized host name is the string generated by the following algorithm:
> >>>
> >>> 1. Convert the host name to a sequence of individual domain name labels.
> >>>
> >>> 2. Convert each label that is not a Non-Reserved LDH (NR-LDH) label, to an
> >> A-label (see Section 2.3.2.1 of [RFC5890] for the former and latter). > > 3.
> >> Concatenate the resulting labels, separated by a %x2E (".") character.
> >>
> >> This algorithm does not handle all possible inputs...
> >
> > I've edited the algorithm to explicitly work on U-labels, XN-labels,
> > and NR-LDH labels and to fail for all other inputs as well as fake
> > A-label outputs.
> >
> >>> 5.1.3. Domain Matching
> >
> > I've added a note to the top of the algorithm.
> >
> >>> If the canonicalized request-host does not domain-match the domain-attribute:
> >>
> >> I would add reference for "domain-match" definition in sec. 5.1.3.
> >
> > Done.
> >
> >>> 8.7. Reliance on DNS
> >>
> >> This is first and only mention of 'DNS' in the text...
> >
> > This concern should be handled with the new name resolution section.
> >
> > Finally, please take a look at my proposed changes and let me know if
> > you have any comments. You can find the PR here:
> > https://github.com/httpwg/http-extensions/pull/3327
> >
> > Thanks,
> > - Steven
> >
> >
> >
> > On Sat, Nov 8, 2025 at 6:54 AM Petr Špaček <pspacek@isc.org> wrote:
> >>
> >> On 13. 10. 25 16:12, Steven Bingler wrote:
> >>> Thank you for your thorough review. My apologies for the long delayed
> >>> response, I had to take a hiatus.
> >>
> >> Hello,
> >>
> >> and I apologize for the delay as well, last weeks were turbulent. Too
> >> bad we could not meet at IETF venue, pen and paper might be useful for
> >> some of these :-)
> >>
> >>> I'm still working through the issues that you've highlighted. I'm not
> >>> as familiar as I'd like to be with name resolution systems so I have
> >>> some further discussions about the issues.
> >>
> >> Happy to answer any questions (if I know answers...)!
> >>
> >> Perhaps this older e-mail could serve as an illustration of the problem
> >> at hand with different naming systems and their different encoding rules
> >> for names:
> >>
> >> https://mailarchive.ietf.org/arch/msg/last-call/bruydK32zq7pIep1VprdRwujcEo/
> >>
> >>>>> (Note that a leading %x2E ("."), if present, is ignored even though that
> >>> character is not permitted.)
> >>>> Should this be mentioned in the 4.1.1. Syntax? This inconsistency makes me
> >>>> wince.
> >>>
> >>> It's my understanding that the `domain-value = <subdomain>`
> >>> syntax already diallows the leading '.', but that for historical
> >>> reasons some servers will still produce it, hence the note.
> >>
> >> With my software developer hat on, it makes me mad that the document
> >> lays out a formal grammar and then free form text elsewhere says "ya
> >> know, ignore the grammar and do this". It kind of defeats purpose of
> >> formal grammar!
> >>
> >> I think it would lower risk of misunderstandings if grammar itself was
> >> absolutely clear. Something along those lines:
> >>
> >> GENERATOR syntax:
> >> domain-av = "Domain" BWS "=" BWS domain-value
> >>
> >> CONSUMER syntax:
> >> domain-av = "Domain" BWS "=" BWS [.]domain-value
> >>
> >> (or some other suitable form)
> >>
> >> Or just rename the section to 'Syntax for generators' (or producers or
> >> whatever term you find descriptive) to make it clear it does not apply
> >> to consumers.
> >>
> >> This ties to my complaint at the end of your reaction - difference
> >> between allowed behavior of generator vs. consumer.
> >>
> >>
> >>>>> 5.1.2. Canonicalized Host Names
> >>>> This algorithm does not handle all possible inputs.
> >>>> Using teminology from RFC 5890 sec. 2.3.1: DNS name (RFC 1035) > LDH host
> >>> name (RFC 1123) > R-LDH Label (RFC5890) > XN-label > Fake A-label vs. A-label
> >>>
> >>> Is the issue here that the current algorithm will, incorrectly,
> >>> instruct to convert a reserved LDH label into an (fake) A-label which
> >>> is invalid?
> >>>
> >>>> According to diagram in RFC 5860 page 10,
> >>> I can't find the diagram you're referring to.
> >>
> >> Apologies, that was a typo. I meant RFC 5890 page 10 (the same number as
> >> in previous paragraph).
> >>
> >>
> >>>>> 5.6.3. The Domain Attribute
> >>>> The preamble of section 5.6
> >>> explicitly states weird inputs are to be expected
> >>>>> 5.7. Storage Model
> >>>
> >>> What this algorithm is relying on is that this domain attribute's
> >>> value must match up with the request url which would mean that any
> >>> "weird" character inputs, "~bla!.example.com", would cause that
> >>> matching to fail and the cookie to be discarded.
> >>
> >> Perhaps add a sentence like "weird inputs will be rejected because they
> >> will not match" or something?
> >>
> >>
> >>>>> 5.8.3. Retrieval Algorithm
> >>>> Sections 5.7 Storage Model and 5.8 Retrieval Model sort of ignore the role of
> >>>> 'generator', i.e. the server which needs to properly form cookies. Perhaps it
> >>>> is okay, but it has surprised me. In DNS spec we often have 'server' and
> >>>> 'client' parts in the spec, but here we seem to have only 'client'.
> >>>
> >>> Sorry, I don't follow. Could you rephrase the issue?
> >>
> >> See above about the difference between producer and consumer grammar.
> >> It's an illustration of the problem I had I mind, I guess, but it is
> >> half a year ago so I might be misremembering things.
> >>
> >> --
> >> Petr Špaček
>
>
> --
> Petr Špaček
Received on Friday, 14 November 2025 17:17:41 UTC