- From: Barry Leiba <barryleiba@computer.org>
- Date: Fri, 5 Jun 2020 22:53:58 -0400
- To: Benjamin Kaduk <kaduk@mit.edu>
- Cc: The IESG <iesg@ietf.org>, draft-ietf-httpbis-header-structure@ietf.org, httpbis-chairs@ietf.org, ietf-http-wg@w3.org, tpauly@apple.com
- Message-ID: <CALaySJ+2xHThd6PxDH_jr2P0xu+JRggPtrh2gRbgsZwwB2dFQw@mail.gmail.com>
Hi, Ben. Can you check version -19 and see if it resolves your DISCUSS? Thanks, Barry On Mon, May 18, 2020 at 10:53 PM Benjamin Kaduk via Datatracker < noreply@ietf.org> wrote: > Benjamin Kaduk has entered the following ballot position for > draft-ietf-httpbis-header-structure-18: Discuss > > When responding, please keep the subject line intact and reply to all > email addresses included in the To and CC lines. (Feel free to cut this > introductory paragraph, however.) > > > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html > for more information about IESG DISCUSS and COMMENT positions. > > > The document, along with other ballot positions, can be found here: > https://datatracker.ietf.org/doc/draft-ietf-httpbis-header-structure/ > > > > ---------------------------------------------------------------------- > DISCUSS: > ---------------------------------------------------------------------- > > Thanks for this document; it will be very nice to have this > more-structured mechanism available for future HTTP header (and trailer) > fields. > > (0) There seem to still be a few lingering internal inconsistencies that > merit further discussion. > > Most notably, there is the inherent risk of skew when both prose > algorithms and ABNF constructions are provided for the same structures. > While Section 1.2 is careful to disclaim that the prose algorithm takes > precedence over the ABNF for parsing, to my reading the coverage in the > following paragraph of serialization procedures imply that it is the > ABNF that is authoritative. In particular, "[i]mplementations MAY vary > from the specified behavior so long as the output still matches the > ABNF" seems to admit deviations from the prose algorithms but require > compliance with the ABNF, in effect making the ABNF take precedence over > the prose algorithm. Having a different description of the procedure > normative for generation vs. consumption invites > interoperability-affecting feature skew, such as the handling of empty > lists as Julian noted on the list. > > Similarly, Section 3.1.1's prose says that inner lists are delimited "by > a single space", but the ABNF uses (1*SP), allowing for more than > one space. > > Additionally, when Section 4.2.3.2 discusses parsing parameter map keys, > the handling for duplicate map key names is specified as overwriting the > previous value, in contrast to the prose description (Section 3.1.2) > that describes these keys as "unique within the scope [of] the > Parameters they occur within". (While dictionary key names might be > expected to have a similar behavior, I did not find conflicting text for > that behavior.) > > Finally, at a prose level we employ needlessly different descriptions in > several places for what is effectively the same procedure; while I do > not think any of these affect interoperability (and thus the full > details are in the COMMENT section), it does seem to be continuing the > theme. (These are things like how we describe the algorithm to skip > implicit-true for booleans, whether we initialize the output string to > the empty string only to immediately add a constant literal character to > it vs. initializing the output string to that literal character, etc.) > > A couple other points that may need further discussion: > > (1) What aspect(s) of structured field processing are case (in)sensitive? > The only mentions I see of case sensitivity are in Section 4.2 discussing > header field names and (implicitly) Section 4.2.2 discussing a > "character-for-character" comparison of dictionary key names, but of > course we cite RFC 5234 for ABNF, which uses case-insensitive matching. > On the other hand, base64 encoding requires case sensitivity for > successful round-tripping of arbitrary binary data. > > (2) One of the stated goals of this document is to define intentionally > strict processing rules, but there are several places where we could > have removed additional optionality but elected to not do so. What is > the criterion for "too far" towards strictness? For example, Section > 4.2.7 gives optionality with respect to base64 padding (see COMMENT). > > > ---------------------------------------------------------------------- > COMMENT: > ---------------------------------------------------------------------- > > Is the document toolchain used for this document responsible for the > rendering of em dashes as a single hyphen, as opposed to normal RFC > style which uses two hyphens for an em dash? > > What's the motivation for "MUST omit values of boolean true" (in > parameters and dictionaries)? It seems to make the output rules more > complicated without a significant gain in encoding size. > > Section 1.2 > > When parsing from HTTP fields, implementations MUST follow the > algorithms, but MAY vary in implementation so as the behaviors are > indistinguishable from specified behavior. If there is disagreement > > nit: was this intended to be "so long as"? > > Section 2 > > o Define the semantics of those structures. > > nit: it's not clear to me what the referent for "those" is intended to > be (as while the previous bullet point does give a list of types of > structures, it seems to be saying that any given field has to pick > exactly one, which is inconsistent with the plural "those"). > > o Specify any additional constraints upon the structures used, as > well as the consequences when those constraints are violated. > > nit: similarly, the plural "structures" may not be needed here. > > Typically, this means that a field definition will specify the top- > level type - List, Dictionary or Item - and then define its allowable > types, and constraints upon them. For example, a header defined as a > > It's a little unfortunate that we task the word "type" with two > different meanings (the top-level List/Dictionary/Item and also the more > fine-grained Integer/String/etc.). We also haven't mentioned the latter > type of "type" yet, though perhaps that's not an issue. > > List might have all Integer members, or a mix of types; a header > defined as an Item might allow only Strings, and additionally only > strings beginning with the letter "Q". Likewise, Inner Lists are > only valid when a field definition explicitly allows them. > > (We haven't mentioned Inner Lists yet, so the reader might not know what > to expect from them.) > > When parsing fails, the entire field is ignored (see Section 4.2); in > most situations, violating field-specific constraints should have the > same effect. Thus, if a header is defined as an Item and required to > be an Integer, but a String is received, the field will by default be > ignored. If the field requires different error handling, this should > be explicitly specified. > > Explicitly specified in the specification that defines the structured > field, I trust? > > Also, I worry that just leaving this as "in most cases" is a bit > antithetical to the goals of structured headers -- don't we want to nail > things down as tightly as we can? If someone wants an exception, they > don't have to use structured headers... > > To further assure that this extensibility is available in the future, > and to encourage consumers to use a complete parser implementation, a > field definition can specify that "grease" Parameters be added by > > Do you want to reference RFC 8701 here? > > An extension to a structured field can then require that an entire > field value be ignored by a recipient that understands the extension > if constraints on the value it defines are not met. > > This seems to set us up for a situation in which an implementation that > does not understand the extension will happily process the header field > and ignore the extension, but implementations that do understand the > extension will ignore the header field entirely. Is that really what's > intended? (When might this be desirable?) > > Section 3 > > This section defines the abstract value types that can be composed > into Structured Fields. The ABNF provided represents the on-wire > > nit: I don't understand the sense in which "composed into" is used -- > don't individual Structured Fields need to be exactly one of these three > types, irregardless of whether there is additional substructure? > > Section 3.1 > > An empty List is denoted by not serializing the field at all. > > So any structured field definition that allows for an empty list has to > define its semantics to be equivalent to the semantics used for when the > peer does not support that structured field at all? > > Section 3.1.1 > > Inner Lists are denoted by surrounding parenthesis, and have their > values delimited by a single space. A field whose value is defined > > The ABNF does not seem to support the "single" in "single space". > > A header field whose value is defined as a List of Inner Lists with > Parameters at both levels could look like: > > Is this also an inner list of strings? > > Section 3.1.2 > > Parameters are an ordered map of key-values pairs that are associated > > nit: is the plural "values" intentional? > > parameters = *( ";" *SP parameter ) > parameter = param-name [ "=" param-value ] > param-name = key > key = ( lcalpha / "*" ) > *( lcalpha / DIGIT / "_" / "-" / "." / "*" ) > > Huh, so I could have a key that was "*" or "*****"? (I assume that no > special "wildcard" semantics are afforded to the asterisk in either > usage...) > > Example-ParamListHeader: abc;a=1;b=2; cde_456, (ghi;jk=4 l);q="9";r=w > > Is it worth writing out which item/list each parameter applies to? > > Parsers MUST support at least 256 parameters on an Item or Inner > List, and support parameter keys with at least 64 characters. Field > specifications can constrain the types and cardinality of individual > parameter names and values as they require. > > In what way might I further constrain the *type* of a parameter *name*? > > Section 3.2 > > Implementations MUST provide access to Dictionaries both by index and > by name. Specifications MAY use either means of accessing the > members. > > Example-DictHeader: en="Applepie", da=:w4ZibGV0w6ZydGU=: > > I, for one, would appreciate a bit of commentary on the interpretation > of the final '=', even if just a forward reference to §3.3.5. > > Typically, a field specification will define the semantics of > Dictionaries by specifying the allowed type(s) for individual member > names, as well as whether their presence is required or optional. > > Similarly to my previous comment, I'm not sure I understand what is > meant by the "type" of a member name. > > Section 3.3.3 > > Are the parentheses around "chr" in the sh-string construction doing > anything? > > Section 3.3.5 > > A Byte Sequence is delimited with colons and encoded using base64 > ([RFC4648], Section 4). For example: > > Thank you for the section reference to disambiguate base64 and > base64url! > > Section 4 > > This section defines how to serialize and parse Structured Fields in > field values, and protocols compatible with them (e.g., in HTTP/2 > [RFC7540] before HPACK [RFC7541] is applied). > > nit: I'm not sure I'm parsing this sentence correctly: what does > "protocols compatible with them" bind to -- is it part of what "this > section defines"? > (I don't actually see any text anywhere in this document that I would > consider to be described by "protocols compatible with them".) > > Section 4.1 > > It's interesting to note that some of the subsections treat "input item > of wrong type" as an explicit failure case (e.g., 4.1.3.1) whereas > others implicitly assume that the input is one of the appropriate types > (e.g., 4.1.1). > > Given a structure defined in this specification, return an ASCII > string suitable for use in a HTTP field value. > > nit: I would have expected wording like "the following procedure > returns" or "return [...] value as follows" or similar; the current > sentence is an imperative that does not indicate how to fulfil it. > (The same comment applies to basically all of the subsections, and a > similar one to § 4.2 and its subsections.) > > 6. Return output_string converted into an array of bytes, using > ASCII encoding [RFC0020]. > > This implicitly assumes that output_string contains only characters > representable in ASCII. I believe this is currently true based on the > relevant ABNF constructions and serialization procedures, but it still > feels like a needless dangling edge case. > > Section 4.1.2 > > Is the nesting level correct for these (sub)procedures? It seems like > top-level steps 3, 4, and 5 are logically children of step 2. > > Section 4.1.6 > > 3. Let output be an empty string. > > 4. Append DQUOTE to output. > > Can't we consolidate steps 3 and 4 (as is done in § 4.1.1.1 where output > is initialized to "("? > > Section 4.1.8 > > 1. If input_bytes is not a sequence of bytes, fail serialization. > > side note: perhaps this is just my roots as a C programmer showing > through, but how could you end up with something that's not a sequence > of bytes? :) > > Section 4.2 > > Strings split across multiple field lines will have unpredictable > results, because comma(s) and whitespace inserted upon combination > will become part of the string output by the parser. Since > concatenation might be done by an upstream intermediary, the results > are not under the control of the serializer or the parser. > > This seems to have the implication that the results are unpredictable > even if a serializer and parser collaborate to use a procedure that, in > disregard of this specification, splits individual list and/or > dictionary entries across multiple field lines, due to the potential for > an intermediary that is not complicit in the deviation from the spec. > In some sense, that might be the key implication of this statement, in > which case it's surprising to not see it spelled out more clearly. > > Section 4.2.1.2 > > 3. While input_string is not empty: > > 1. Discard any leading SP characters from input_string. > > 2. If the first character of input_string is ")": > > Don't we need to check if the input_string is empty again after > consuming leading SP but before attempting to consume anything else? > > Section 4.2.2 > > Given an ASCII string as input_string, return an ordered map whose > values are (item_or_inner_list, parameters) tuples. input_string is > > Should we say anything about the map keys? > > Section 4.2.3.2 > > Why do we use a different procedure to implement the "implicit true" > property of booleans than we did in Section 4.2.2? > > Section 4.2.4 > > 4. Otherwise, prepend char to input_string, and exit the loop. > > This is the only place in the document where we use "prepend" in an > algorithm. That might make it surprising to the reader; is it easy to > reformulate this without greedily consuming input that does not match? > > 2. If output_number is outside the range -999,999,999,999,999 > to 999,999,999,999,999 inclusive, fail parsing. > > [I'm not sure how this check could trigger given the length check in 7.5] > > Section 4.2.7 > > Because some implementations of base64 do not allow reject of encoded > data that is not properly "=" padded (see [RFC4648], Section 3.2), > parsers SHOULD NOT fail when it is not present, unless they cannot be > configured to do so. > > This is a new protocol mechanism; why do we need to be so accommodating > to inflexible implementations? > (Also, nit: "rejection") > > Because some implementations of base64 do not allow rejection of > encoded data that has non-zero pad bits (see [RFC4648], Section 3.5), > parsers SHOULD NOT fail when it is present, unless they cannot be > configured to do so. > > (ibid) > Also, nit: singular/plural mismatch "bits"/"it is" > > Section 6 > > It seems worth mentioning the handling for duplicated key names (e.g., > in parameters and dictionaries) w.r.t. overwrite or must-be-unique, and > how there have been previous vulnerabilities relating to different > implementations choosing "first one wins" vs. "last one wins". > > Appendix B > > Implementers should note that Dictionaries and Parameters are order- > preserving maps. Some fields may not convey meaning in the ordering > of these data types, but it should still be exposed so that > applications which need to use it will have it available. > > This far through the document I still have no idea when the order might > actually be useful; even this text is only noting that (at least > sometimes) the order does not convey meaning. > > > >
Received on Saturday, 6 June 2020 02:54:25 UTC