- From: Dale Worley via Datatracker <noreply@ietf.org>
- Date: Mon, 10 Feb 2025 10:33:41 -0800
- To: <gen-art@ietf.org>
- Cc: draft-ietf-httpbis-rfc6265bis.all@ietf.org, ietf-http-wg@w3.org, last-call@ietf.org
Reviewer: Dale Worley
Review result: Ready with Issues
I am the assigned Gen-ART reviewer for this draft. The General Area
Review Team (Gen-ART) reviews all IETF documents being processed
by the IESG for the IETF Chair. Please treat these comments just
like any other last call comments.
For more information, please see the FAQ at
<https://wiki.ietf.org/en/group/gen/GenArtFAQ>.
Document: draft-ietf-httpbis-rfc6265bis
Reviewer: Dale R. Worley
Review Date:
IETF LC End Date: 2025-02-11
IESG Telechat date: 2025-02-20
Summary:
This draft is on the right track but has open issues, described in
the review.
Technical issues:
* Whitespace at the end of a cookie value when parsing a cookie
string
Sec. 4.1.1 says:
set-cookie-string = BWS cookie-pair *( BWS ";" OWS cookie-av )
cookie-av = expires-av / max-age-av / domain-av /
path-av / secure-av / httponly-av /
samesite-av / extension-av
extension-av = *av-octet
av-octet = %x20-3A / %x3C-7E
; any CHAR except CTLs or ";"
This is ambiguous for parsing extension-av. E.g.
Set-Cookie: name=value;attr1= v a l u e ;attr2=x
Does the value of attr1 start with "v" or with " "? Does it end with
"e" or with " "?
RFC 9110 sec. 5.6.3 says that BWS can be deleted by intermediate
processes, so presence of this trailing whitespace can't be relied
upon, which suggests that attribute values should never end with
whitespace.
Looking at sec. 5.6 step 5
5. Remove any leading or trailing WSP characters from the attribute-
name string and the attribute-value string.
shows that the intention is that the attribute name and value should
have no leading or trailing whitespace. That would be better handled by
being explicit in the grammar:
extension-av = ext-av-name / ext-av-name "=" ext-av-value
ext-av-name = null / av-name-non-wsp /
av-name-non-wsp *av-name-octet av-name-non-wsp
av-name-non-wsp = %x21-3A / %x3C / %x3E-7E
; any CHAR except CTLs, " ", ";", or "="
av-name-octet = %x20-3A / %x3C / %x3E-7E
; any CHAR except CTLs, ";", or "="
ext-av-value = null / av-value-non-wsp /
av-value-non-wsp *av-value-octet av-value-non-wsp
av-value-non-wsp = %x21-3A / %x3C-7E
; any CHAR except CTLs, " ", or ";"
av-value-octet = %x20-3A / %x3C-7E
; any CHAR except CTLs or ";"
or perhaps by escaping into English:
extension-av = ext-av-name / ext-av-name "=" ext-av-value
ext-av-name = *av-name-octet
; but neither the first nor the last is " "
av-name-octet = %x20-3A / %x3C / %x3E-7E
; any CHAR except CTLs, ";", or "="
ext-av-value = *av-value-octet
; but neither the first nor the last is " "
av-value-octet = %x20-3A / %x3C-7E
; any CHAR except CTLs or ";"
Then again, since this grammar is for generating set-cookie-string's
by well-behaved servers, perhaps whitespace in ext-av-name's should be
omitted:
ext-av-name = *av-name-octet
av-name-octet = %x21-3A / %x3C / %x3E-7E
; any CHAR except CTLs, " ", ";", or "="
* Description of worker-based requests
5.2.2. Worker-based requests
Note: The descriptions below assume that workers must be same-origin
with the documents that instantiate them. If this invariant changes,
we'll need to take the worker's script's URI into account when
determining their status.
The import of this is not clear to me. One possibility is that
"Currently, all implementations require that workers must be
same-origin with the documents that instantiate them." Another
possibility is "We only discuss the case when a worker is same-origin
with the document that instantiates it." In either case, this should
include/point to a discussion of what to do when this is *not* the
case, or a statement that it is out of scope for this document.
* What is the universe of characters?
RFC 9110 prescribes that HTTP headers be in US-ASCII. However the
existence of non-HTTP APIs for the cookie store raises the possibility
that a larger set of characters might be supported. As far as I can
tell, this document does not specify the universe of characters.
Unfortunately, this question is significant. For example in sec. 5.6 is:
1. If the set-cookie-string contains a %x00-08 / %x0A-1F / %x7F
character (CTL characters excluding HTAB): Abort these steps and
ignore the set-cookie-string entirely.
The practical effect of this depends on the universe of characters.
And in any case, it would be clearer to phrase this as follows (as it
makes clearer what assertion about the string holds after this step is
executed):
1. If the set-cookie-string contains one or more characters not in
the set %x09 / %x20-7E (HTAB, SPC, and VCHAR): Abort these steps and
ignore the set-cookie-string entirely.
An overall policy/philosophy regarding the universe of characters
should be stated toward the beginning of the document, and the various
algorithms made consistent with it. For example in sec. 5.7, there
are explicit limitations on the characters in cookie names, cookie
values, and domain attributes, but not for some other parts of the
value.
Nits/editorial comments:
As a global issue, the capitalization of "Note:" vs. "NOTE:" is
inconsistent.
2.1. Conformance Criteria
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119].
This should be replaced with the RFC 8174 version:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
capitals, as shown here.
2.3. Terminology
The terms "active browsing context", "active document", "ancestor
navigables", "container document", "content navigable", "dedicated
worker", "Document", "inclusive ancestor navigables", "navigable",
"opaque origin", "sandboxed origin browsing context flag", "shared
worker", "the worker's Documents", "top-level traversable", and
"WorkerGlobalScope" are defined in [HTML].
"navigate" should be added to this list because of the later paragraph:
The term "top-level navigation" refers to a navigation of a top-level
traversable.
Though I notice that [HTML] is not well-indexed, as it has no
index/glossary of specialized terminology used in itself, so it's hard
to find the definition of "navigate".
4.1.1. Syntax
Servers SHOULD NOT send Set-Cookie header
fields that fail to conform to the following grammar:
Is it established that a "header field" is the value part of a
"header" (which is what the following grammar describes)? RFC 9110
sec. 6.3 seems to indicate that the header name is part of a
"header field":
Fields (Section 5) that are sent or received before the content are
referred to as "header fields" (or just "headers", colloquially).
RFC 9110 sec. 5.1 suggests that the usual term is "field line value",
or "field value" for the concatenation of the field line values of
multiple headers.
Similar considerations apply to sec. 4.3.1 of this document:
the user agent will send a Cookie header field that
conforms to the following grammar:
and probably other places.
4.2.1. Syntax
many popular implementations have default limits of 8K.
This should probably be "8K octets", or better, "8192 octets".
Compare with other length limit specifications in this document.
5.1.1. Dates
5. Abort these steps and fail to parse the cookie-date if:
I would say "Abort this algorithm". There are other instances of this
usage, too.
5.4. Cookie Name Prefixes
To prevent these issues, UAs MUST match cookie name prefixes case-
insensitive.
should be "insensitively".
5.6. The Set-Cookie Header Field
Let the cookie-av string be the characters consumed in this step.
To this should be added, "Update the unparsed-attributes to be the
part of unparsed-attributes that is not consumed." As written, the
text depends on the verb "consumed" to imply this action.
5.7. Storage Model
3. If the cookie-name or the cookie-value contains a %x00-08 /
%x0A-1F / %x7F character (CTL characters excluding HTAB), abort
these steps and ignore the cookie entirely.
Of course, such a set-cookie should have been ignored when
sec. 5.6 processing was applied. But having this text is useful if
the cookie store can be accessed by an API that does not go through
set-cookie-string parsing.
Note the technical issue about the universe of characters applies
here as well.
8. If the domain-attribute contains a character that is not in the
range of [USASCII] characters, abort these steps and ignore the
cookie entirely.
"[USASCII]" doesn't seem to be strictly defined. E.g. the term is
used descriptively in sec 2.2 but a definition is not explicitly
imported from RFC 5234, and 5234 doesn't use the term. Probably what
is meant is "is not in CHAR", which is explicitly defined. But again,
a set-cookie string that contained such characters would already have
been ignored. And that a character set restriction is given
specifically for the domain-attribute is odd; it seems like it should
be subsumed in an overall character set policy.
5.8.1. The Cookie Header Field
For
example, the user agent might wish to block sending cookies during
"third-party" requests from setting cookies (see Section 7.1).
I can't understand this sentence. It seems to speak of "requests from
setting cookies" but that makes no sense.
While this specification requires that a single cookie-
string be produced, some user agents may split that string across
multiple cookie header fields.
Does this mean "user agents MAY split that string"? Or perhaps, "user
agents SHOULD NOT split the sting"? If so, it should say so
explicitly rather than the current wording, which could possibly be a
warning that some user agents have not-strictly-conformant behavior.
5.8.2. Non-HTTP APIs
A user agent MAY return an empty cookie-string in certain contexts,
such as when a retrieval occurs within a third-party context (see
Section 7.1).
Shouldn't this be in sec 5.8? And for that matter, wouldn't the
return also be empty if there simply were no matching cookies in the
store?
5.8.3. Retrieval Algorithm
This sentence
If this change results in a cookie's
domain becoming a public suffix then that cookie is considered
invalid as it would have been rejected during creation
would be better as:
If this change results in a cookie's
domain becoming a public suffix then that cookie is considered
invalid as it would have been rejected during creation
had it been created now
That change being within this change:
NOTE: (For user agents configured to reject "public suffixes")
It's possible that the public suffix list was changed since a
cookie was created. If this change results in a cookie's
domain becoming a public suffix then that cookie is considered
invalid as it would have been rejected during creation (See
Section 5.7 step 9). User agents should be careful to avoid
retrieving these invalid cookies even if they domain-match the
host of the retrieval's URI.
should be made an explicit processing step:
* For user agents configured to reject "public suffixes", the
cookie's domain is not a public suffix. (It's possible that
the public suffix list was changed since the cookie was
created. If this change results in the cookie's domain
becoming a public suffix then that cookie is considered
invalid as it would have been rejected during creation if it
had been created now (See Section 5.7 step 9).)
--
Step 4/3 says:
3. If there is an unprocessed cookie in the cookie-list, output
the characters %x3B and %x20 ("; ").
For quite a while I mis-read what this meant. I think it would be
much clearer if it was phrased:
3. If the cookie is not the last cookie in the cookie-list,
output the characters %x3B and %x20 ("; ").
6.2. Application Programming Interfaces
One reason the Cookie and Set-Cookie header fields use such esoteric
syntax is that many platforms (both in servers and user agents)
provide a string-based application programming interface (API) to
cookies, requiring application-layer programmers to generate and
parse the syntax used by the Cookie and Set-Cookie header fields,
which many programmers have done incorrectly, resulting in
interoperability problems.
Is "esoteric" the correct word? It seems that "complex" is more
correct. Actually, it's not so much that the *syntax* is complex but
that the semantic processing is rather baroque, with the reason for
that being historical compatibility considerations.
9.2. Set-Cookie
The permanent message header field registry (see
[HttpFieldNameRegistry]) ...
probably would be better copying 9.1 as:
The HTTP Field Name Registry (see [HttpFieldNameRegistry]) ...
9.3.1. Procedure
Note that attribute
names are generally defined in CamelCase, but technically accepted
case-insensitively.
What does "technically" mean here? Given that this section is pointed
to from outside the RFC, the sentence should not require further
context to understand. I think what is meant is that names are
accepted case-insensitively. But really, you want to replace this
sentence with something normative like:
Note that attribute names are generally defined in CamelCase but
MUST be recognized case-insensitively. Two attribute names that
are case-insensitively the same MUST NOT be registered.
10.1. Normative References
[HTML] Hickson, I., Pieters, S., van Kesteren, A., Jägenstedt,
P., and D. Denicola, "HTML", n.d.,
<https://html.spec.whatwg.org/>.
This normative reference is impossible given that the URL fetches a
resource that is described as "living", meaning "changing over time",
but this reference doesn't specify the version as of a specific time.
Hence, it is impossible to *normatively* say that [HTML] does nor does
not specify any particular thing.
[END]
Received on Monday, 10 February 2025 18:33:46 UTC