Genart last call review of draft-ietf-httpbis-rfc6265bis-19 from Dale Worley via Datatracker on 2025-02-10 (ietf-http-wg@w3.org from January to March 2025)

From: Dale Worley via Datatracker <noreply@ietf.org>
Date: Mon, 10 Feb 2025 10:33:41 -0800
To: <gen-art@ietf.org>
Cc: draft-ietf-httpbis-rfc6265bis.all@ietf.org, ietf-http-wg@w3.org, last-call@ietf.org
Message-ID: <173921242133.442654.4933653779122217675@dt-datatracker-75c44cbbdf-pxnd6>
Reviewer: Dale Worley
Review result: Ready with Issues

I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair.  Please treat these comments just
like any other last call comments.

For more information, please see the FAQ at

<https://wiki.ietf.org/en/group/gen/GenArtFAQ>.

Document:  draft-ietf-httpbis-rfc6265bis
Reviewer:  Dale R. Worley
Review Date:  
IETF LC End Date:  2025-02-11
IESG Telechat date:  2025-02-20

Summary:

    This draft is on the right track but has open issues, described in
    the review.

Technical issues:

* Whitespace at the end of a cookie value when parsing a cookie
string

Sec. 4.1.1 says:

   set-cookie-string = BWS cookie-pair *( BWS ";" OWS cookie-av )
   cookie-av         = expires-av / max-age-av / domain-av /
                       path-av / secure-av / httponly-av /
                       samesite-av / extension-av
   extension-av      = *av-octet
   av-octet          = %x20-3A / %x3C-7E
                         ; any CHAR except CTLs or ";"

This is ambiguous for parsing extension-av.  E.g.

  Set-Cookie: name=value;attr1= v a l u e ;attr2=x

Does the value of attr1 start with "v" or with " "?  Does it end with
"e" or with " "?

RFC 9110 sec.  5.6.3 says that BWS can be deleted by intermediate
processes, so presence of this trailing whitespace can't be relied
upon, which suggests that attribute values should never end with
whitespace.

Looking at sec. 5.6 step 5

   5.  Remove any leading or trailing WSP characters from the attribute-
       name string and the attribute-value string.

shows that the intention is that the attribute name and value should
have no leading or trailing whitespace.  That would be better handled by
being explicit in the grammar:

   extension-av      = ext-av-name / ext-av-name "=" ext-av-value
   ext-av-name       = null / av-name-non-wsp /
            av-name-non-wsp *av-name-octet av-name-non-wsp
   av-name-non-wsp   = %x21-3A / %x3C / %x3E-7E
                         ; any CHAR except CTLs, " ", ";", or "="
   av-name-octet     = %x20-3A / %x3C / %x3E-7E
                         ; any CHAR except CTLs, ";", or "="
   ext-av-value      = null / av-value-non-wsp /
            av-value-non-wsp *av-value-octet av-value-non-wsp
   av-value-non-wsp   = %x21-3A / %x3C-7E
                         ; any CHAR except CTLs, " ", or ";"
   av-value-octet     = %x20-3A / %x3C-7E
                         ; any CHAR except CTLs or ";"

or perhaps by escaping into English:

   extension-av      = ext-av-name / ext-av-name "=" ext-av-value
   ext-av-name       = *av-name-octet
                         ; but neither the first nor the last is " "
   av-name-octet     = %x20-3A / %x3C / %x3E-7E
                         ; any CHAR except CTLs, ";", or "="
   ext-av-value      = *av-value-octet
                         ; but neither the first nor the last is " "
   av-value-octet     = %x20-3A / %x3C-7E
                         ; any CHAR except CTLs or ";"

Then again, since this grammar is for generating set-cookie-string's
by well-behaved servers, perhaps whitespace in ext-av-name's should be
omitted:

   ext-av-name       = *av-name-octet
   av-name-octet     = %x21-3A / %x3C / %x3E-7E
                         ; any CHAR except CTLs, " ", ";", or "="

* Description of worker-based requests

5.2.2.  Worker-based requests

   Note: The descriptions below assume that workers must be same-origin
   with the documents that instantiate them.  If this invariant changes,
   we'll need to take the worker's script's URI into account when
   determining their status.

The import of this is not clear to me.  One possibility is that
"Currently, all implementations require that workers must be
same-origin with the documents that instantiate them."  Another
possibility is "We only discuss the case when a worker is same-origin
with the document that instantiates it."  In either case, this should
include/point to a discussion of what to do when this is *not* the
case, or a statement that it is out of scope for this document.

* What is the universe of characters?

RFC 9110 prescribes that HTTP headers be in US-ASCII.  However the
existence of non-HTTP APIs for the cookie store raises the possibility
that a larger set of characters might be supported.  As far as I can
tell, this document does not specify the universe of characters.

Unfortunately, this question is significant.  For example in sec. 5.6 is:

   1.  If the set-cookie-string contains a %x00-08 / %x0A-1F / %x7F
       character (CTL characters excluding HTAB): Abort these steps and
       ignore the set-cookie-string entirely.

The practical effect of this depends on the universe of characters.
And in any case, it would be clearer to phrase this as follows (as it
makes clearer what assertion about the string holds after this step is
executed):

   1.  If the set-cookie-string contains one or more characters not in
       the set %x09 / %x20-7E (HTAB, SPC, and VCHAR): Abort these steps and
       ignore the set-cookie-string entirely.

An overall policy/philosophy regarding the universe of characters
should be stated toward the beginning of the document, and the various
algorithms made consistent with it.  For example in sec. 5.7, there
are explicit limitations on the characters in cookie names, cookie
values, and domain attributes, but not for some other parts of the
value.

Nits/editorial comments:

As a global issue, the capitalization of "Note:" vs. "NOTE:" is
inconsistent.

2.1.  Conformance Criteria

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

This should be replaced with the RFC 8174 version:

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

2.3.  Terminology

   The terms "active browsing context", "active document", "ancestor
   navigables", "container document", "content navigable", "dedicated
   worker", "Document", "inclusive ancestor navigables", "navigable",
   "opaque origin", "sandboxed origin browsing context flag", "shared
   worker", "the worker's Documents", "top-level traversable", and
   "WorkerGlobalScope" are defined in [HTML].

"navigate" should be added to this list because of the later paragraph:

   The term "top-level navigation" refers to a navigation of a top-level
   traversable.

Though I notice that [HTML] is not well-indexed, as it has no
index/glossary of specialized terminology used in itself, so it's hard
to find the definition of "navigate".

4.1.1.  Syntax

   Servers SHOULD NOT send Set-Cookie header
   fields that fail to conform to the following grammar:

Is it established that a "header field" is the value part of a
"header" (which is what the following grammar describes)?  RFC 9110
sec. 6.3 seems to indicate that the header name is part of a
"header field":

   Fields (Section 5) that are sent or received before the content are
   referred to as "header fields" (or just "headers", colloquially).

RFC 9110 sec. 5.1 suggests that the usual term is "field line value",
or "field value" for the concatenation of the field line values of
multiple headers.

Similar considerations apply to sec. 4.3.1 of this document:

   the user agent will send a Cookie header field that
   conforms to the following grammar:

and probably other places.

4.2.1.  Syntax

   many popular implementations have default limits of 8K.

This should probably be "8K octets", or better, "8192 octets".
Compare with other length limit specifications in this document.

5.1.1.  Dates

   5.  Abort these steps and fail to parse the cookie-date if:

I would say "Abort this algorithm".  There are other instances of this
usage, too.

5.4.  Cookie Name Prefixes

   To prevent these issues, UAs MUST match cookie name prefixes case-
   insensitive.

should be "insensitively".

5.6.  The Set-Cookie Header Field

       Let the cookie-av string be the characters consumed in this step.

To this should be added, "Update the unparsed-attributes to be the
part of unparsed-attributes that is not consumed."  As written, the
text depends on the verb "consumed" to imply this action.

5.7.  Storage Model

   3.   If the cookie-name or the cookie-value contains a %x00-08 /
        %x0A-1F / %x7F character (CTL characters excluding HTAB), abort
        these steps and ignore the cookie entirely.

Of course, such a set-cookie should have been ignored when
sec. 5.6 processing was applied.  But having this text is useful if
the cookie store can be accessed by an API that does not go through
set-cookie-string parsing.

Note the technical issue about the universe of characters applies
here as well.

   8.   If the domain-attribute contains a character that is not in the
        range of [USASCII] characters, abort these steps and ignore the
        cookie entirely.

"[USASCII]" doesn't seem to be strictly defined.  E.g. the term is
used descriptively in sec 2.2 but a definition is not explicitly
imported from RFC 5234, and 5234 doesn't use the term.  Probably what
is meant is "is not in CHAR", which is explicitly defined.  But again,
a set-cookie string that contained such characters would already have
been ignored.  And that a character set restriction is given
specifically for the domain-attribute is odd; it seems like it should
be subsumed in an overall character set policy.

5.8.1.  The Cookie Header Field

   For
   example, the user agent might wish to block sending cookies during
   "third-party" requests from setting cookies (see Section 7.1).

I can't understand this sentence.  It seems to speak of "requests from
setting cookies" but that makes no sense.

   While this specification requires that a single cookie-
   string be produced, some user agents may split that string across
   multiple cookie header fields.

Does this mean "user agents MAY split that string"?  Or perhaps, "user
agents SHOULD NOT split the sting"?  If so, it should say so
explicitly rather than the current wording, which could possibly be a
warning that some user agents have not-strictly-conformant behavior.

5.8.2.  Non-HTTP APIs

   A user agent MAY return an empty cookie-string in certain contexts,
   such as when a retrieval occurs within a third-party context (see
   Section 7.1).

Shouldn't this be in sec 5.8?  And for that matter, wouldn't the
return also be empty if there simply were no matching cookies in the
store?

5.8.3.  Retrieval Algorithm

This sentence

          If this change results in a cookie's
          domain becoming a public suffix then that cookie is considered
          invalid as it would have been rejected during creation

would be better as:

          If this change results in a cookie's
          domain becoming a public suffix then that cookie is considered
          invalid as it would have been rejected during creation
   had it been created now

That change being within this change:

          NOTE: (For user agents configured to reject "public suffixes")
          It's possible that the public suffix list was changed since a
          cookie was created.  If this change results in a cookie's
          domain becoming a public suffix then that cookie is considered
          invalid as it would have been rejected during creation (See
          Section 5.7 step 9).  User agents should be careful to avoid
          retrieving these invalid cookies even if they domain-match the
          host of the retrieval's URI.

should be made an explicit processing step:

       *  For user agents configured to reject "public suffixes", the
          cookie's domain is not a public suffix.  (It's possible that
          the public suffix list was changed since the cookie was
          created.  If this change results in the cookie's domain
          becoming a public suffix then that cookie is considered
          invalid as it would have been rejected during creation if it
          had been created now (See Section 5.7 step 9).)

--

Step 4/3 says:

       3.  If there is an unprocessed cookie in the cookie-list, output
           the characters %x3B and %x20 ("; ").

For quite a while I mis-read what this meant.  I think it would be
much clearer if it was phrased:

       3.  If the cookie is not the last cookie in the cookie-list,
           output the characters %x3B and %x20 ("; ").

6.2.  Application Programming Interfaces

   One reason the Cookie and Set-Cookie header fields use such esoteric
   syntax is that many platforms (both in servers and user agents)
   provide a string-based application programming interface (API) to
   cookies, requiring application-layer programmers to generate and
   parse the syntax used by the Cookie and Set-Cookie header fields,
   which many programmers have done incorrectly, resulting in
   interoperability problems.

Is "esoteric" the correct word?  It seems that "complex" is more
correct.  Actually, it's not so much that the *syntax* is complex but
that the semantic processing is rather baroque, with the reason for
that being historical compatibility considerations.

9.2.  Set-Cookie

   The permanent message header field registry (see
   [HttpFieldNameRegistry]) ...

probably would be better copying 9.1 as:

   The HTTP Field Name Registry (see [HttpFieldNameRegistry]) ...

9.3.1.  Procedure

   Note that attribute
   names are generally defined in CamelCase, but technically accepted
   case-insensitively.

What does "technically" mean here?  Given that this section is pointed
to from outside the RFC, the sentence should not require further
context to understand.  I think what is meant is that names are
accepted case-insensitively.  But really, you want to replace this
sentence with something normative like:

   Note that attribute names are generally defined in CamelCase but
   MUST be recognized case-insensitively.  Two attribute names that
   are case-insensitively the same MUST NOT be registered.

10.1.  Normative References

   [HTML]     Hickson, I., Pieters, S., van Kesteren, A., Jägenstedt,
              P., and D. Denicola, "HTML", n.d.,
              <https://html.spec.whatwg.org/>.

This normative reference is impossible given that the URL fetches a
resource that is described as "living", meaning "changing over time",
but this reference doesn't specify the version as of a specific time.
Hence, it is impossible to *normatively* say that [HTML] does nor does
not specify any particular thing.

[END]
Received on Monday, 10 February 2025 18:33:46 UTC