Re: Benjamin Kaduk's Discuss on draft-ietf-httpbis-header-structure-18: (with DISCUSS and COMMENT) from Barry Leiba on 2020-06-06 (ietf-http-wg@w3.org from April to June 2020)

From: Barry Leiba <barryleiba@computer.org>
Date: Fri, 5 Jun 2020 22:53:58 -0400
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, draft-ietf-httpbis-header-structure@ietf.org, httpbis-chairs@ietf.org, ietf-http-wg@w3.org, tpauly@apple.com
Message-ID: <CALaySJ+2xHThd6PxDH_jr2P0xu+JRggPtrh2gRbgsZwwB2dFQw@mail.gmail.com>
Hi, Ben.  Can you check version -19 and see if it resolves your DISCUSS?

Thanks,
Barry

On Mon, May 18, 2020 at 10:53 PM Benjamin Kaduk via Datatracker <
noreply@ietf.org> wrote:

> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-httpbis-header-structure-18: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-httpbis-header-structure/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> Thanks for this document; it will be very nice to have this
> more-structured mechanism available for future HTTP header (and trailer)
> fields.
>
> (0) There seem to still be a few lingering internal inconsistencies that
> merit further discussion.
>
> Most notably, there is the inherent risk of skew when both prose
> algorithms and ABNF constructions are provided for the same structures.
> While Section 1.2 is careful to disclaim that the prose algorithm takes
> precedence over the ABNF for parsing, to my reading the coverage in the
> following paragraph of serialization procedures imply that it is the
> ABNF that is authoritative.  In particular, "[i]mplementations MAY vary
> from the specified behavior so long as the output still matches the
> ABNF" seems to admit deviations from the prose algorithms but require
> compliance with the ABNF, in effect making the ABNF take precedence over
> the prose algorithm.  Having a different description of the procedure
> normative for generation vs. consumption invites
> interoperability-affecting feature skew, such as the handling of empty
> lists as Julian noted on the list.
>
> Similarly, Section 3.1.1's prose says that inner lists are delimited "by
> a single space", but the ABNF uses (1*SP), allowing for more than
> one space.
>
> Additionally, when Section 4.2.3.2 discusses parsing parameter map keys,
> the handling for duplicate map key names is specified as overwriting the
> previous value, in contrast to the prose description (Section 3.1.2)
> that describes these keys as "unique within the scope [of] the
> Parameters they occur within".  (While dictionary key names might be
> expected to have a similar behavior, I did not find conflicting text for
> that behavior.)
>
> Finally, at a prose level we employ needlessly different descriptions in
> several places for what is effectively the same procedure; while I do
> not think any of these affect interoperability (and thus the full
> details are in the COMMENT section), it does seem to be continuing the
> theme.  (These are things like how we describe the algorithm to skip
> implicit-true for booleans, whether we initialize the output string to
> the empty string only to immediately add a constant literal character to
> it vs. initializing the output string to that literal character, etc.)
>
> A couple other points that may need further discussion:
>
> (1) What aspect(s) of structured field processing are case (in)sensitive?
> The only mentions I see of case sensitivity are in Section 4.2 discussing
> header field names and (implicitly) Section 4.2.2 discussing a
> "character-for-character" comparison of dictionary key names, but of
> course we cite RFC 5234 for ABNF, which uses case-insensitive matching.
> On the other hand, base64 encoding requires case sensitivity for
> successful round-tripping of arbitrary binary data.
>
> (2) One of the stated goals of this document is to define intentionally
> strict processing rules, but there are several places where we could
> have removed additional optionality but elected to not do so.  What is
> the criterion for "too far" towards strictness?  For example, Section
> 4.2.7 gives optionality with respect to base64 padding (see COMMENT).
>
>
> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> Is the document toolchain used for this document responsible for the
> rendering of em dashes as a single hyphen, as opposed to normal RFC
> style which uses two hyphens for an em dash?
>
> What's the motivation for "MUST omit values of boolean true" (in
> parameters and dictionaries)?  It seems to make the output rules more
> complicated without a significant gain in encoding size.
>
> Section 1.2
>
>    When parsing from HTTP fields, implementations MUST follow the
>    algorithms, but MAY vary in implementation so as the behaviors are
>    indistinguishable from specified behavior.  If there is disagreement
>
> nit: was this intended to be "so long as"?
>
> Section 2
>
>    o  Define the semantics of those structures.
>
> nit: it's not clear to me what the referent for "those" is intended to
> be (as while the previous bullet point does give a list of types of
> structures, it seems to be saying that any given field has to pick
> exactly one, which is inconsistent with the plural "those").
>
>    o  Specify any additional constraints upon the structures used, as
>       well as the consequences when those constraints are violated.
>
> nit: similarly, the plural "structures" may not be needed here.
>
>    Typically, this means that a field definition will specify the top-
>    level type - List, Dictionary or Item - and then define its allowable
>    types, and constraints upon them.  For example, a header defined as a
>
> It's a little unfortunate that we task the word "type" with two
> different meanings (the top-level List/Dictionary/Item and also the more
> fine-grained Integer/String/etc.).  We also haven't mentioned the latter
> type of "type" yet, though perhaps that's not an issue.
>
>    List might have all Integer members, or a mix of types; a header
>    defined as an Item might allow only Strings, and additionally only
>    strings beginning with the letter "Q".  Likewise, Inner Lists are
>    only valid when a field definition explicitly allows them.
>
> (We haven't mentioned Inner Lists yet, so the reader might not know what
> to expect from them.)
>
>    When parsing fails, the entire field is ignored (see Section 4.2); in
>    most situations, violating field-specific constraints should have the
>    same effect.  Thus, if a header is defined as an Item and required to
>    be an Integer, but a String is received, the field will by default be
>    ignored.  If the field requires different error handling, this should
>    be explicitly specified.
>
> Explicitly specified in the specification that defines the structured
> field, I trust?
>
> Also, I worry that just leaving this as "in most cases" is a bit
> antithetical to the goals of structured headers -- don't we want to nail
> things down as tightly as we can?  If someone wants an exception, they
> don't have to use structured headers...
>
>    To further assure that this extensibility is available in the future,
>    and to encourage consumers to use a complete parser implementation, a
>    field definition can specify that "grease" Parameters be added by
>
> Do you want to reference RFC 8701 here?
>
>    An extension to a structured field can then require that an entire
>    field value be ignored by a recipient that understands the extension
>    if constraints on the value it defines are not met.
>
> This seems to set us up for a situation in which an implementation that
> does not understand the extension will happily process the header field
> and ignore the extension, but implementations that do understand the
> extension will ignore the header field entirely.  Is that really what's
> intended?  (When might this be desirable?)
>
> Section 3
>
>    This section defines the abstract value types that can be composed
>    into Structured Fields.  The ABNF provided represents the on-wire
>
> nit: I don't understand the sense in which "composed into" is used --
> don't individual Structured Fields need to be exactly one of these three
> types, irregardless of whether there is additional substructure?
>
> Section 3.1
>
>    An empty List is denoted by not serializing the field at all.
>
> So any structured field definition that allows for an empty list has to
> define its semantics to be equivalent to the semantics used for when the
> peer does not support that structured field at all?
>
> Section 3.1.1
>
>    Inner Lists are denoted by surrounding parenthesis, and have their
>    values delimited by a single space.  A field whose value is defined
>
> The ABNF does not seem to support the "single" in "single space".
>
>    A header field whose value is defined as a List of Inner Lists with
>    Parameters at both levels could look like:
>
> Is this also an inner list of strings?
>
> Section 3.1.2
>
>    Parameters are an ordered map of key-values pairs that are associated
>
> nit: is the plural "values" intentional?
>
>    parameters    = *( ";" *SP parameter )
>    parameter     = param-name [ "=" param-value ]
>    param-name    = key
>    key           = ( lcalpha / "*" )
>                    *( lcalpha / DIGIT / "_" / "-" / "." / "*" )
>
> Huh, so I could have a key that was "*" or "*****"?  (I assume that no
> special "wildcard" semantics are afforded to the asterisk in either
> usage...)
>
>    Example-ParamListHeader: abc;a=1;b=2; cde_456, (ghi;jk=4 l);q="9";r=w
>
> Is it worth writing out which item/list each parameter applies to?
>
>    Parsers MUST support at least 256 parameters on an Item or Inner
>    List, and support parameter keys with at least 64 characters.  Field
>    specifications can constrain the types and cardinality of individual
>    parameter names and values as they require.
>
> In what way might I further constrain the *type* of a parameter *name*?
>
> Section 3.2
>
>    Implementations MUST provide access to Dictionaries both by index and
>    by name.  Specifications MAY use either means of accessing the
>    members.
>
>    Example-DictHeader: en="Applepie", da=:w4ZibGV0w6ZydGU=:
>
> I, for one, would appreciate a bit of commentary on the interpretation
> of the final '=', even if just a forward reference to §3.3.5.
>
>    Typically, a field specification will define the semantics of
>    Dictionaries by specifying the allowed type(s) for individual member
>    names, as well as whether their presence is required or optional.
>
> Similarly to my previous comment, I'm not sure I understand what is
> meant by the "type" of a member name.
>
> Section 3.3.3
>
> Are the parentheses around "chr" in the sh-string construction doing
> anything?
>
> Section 3.3.5
>
>    A Byte Sequence is delimited with colons and encoded using base64
>    ([RFC4648], Section 4).  For example:
>
> Thank you for the section reference to disambiguate base64 and
> base64url!
>
> Section 4
>
>    This section defines how to serialize and parse Structured Fields in
>    field values, and protocols compatible with them (e.g., in HTTP/2
>    [RFC7540] before HPACK [RFC7541] is applied).
>
> nit: I'm not sure I'm parsing this sentence correctly: what does
> "protocols compatible with them" bind to -- is it part of what "this
> section defines"?
> (I don't actually see any text anywhere in this document that I would
> consider to be described by "protocols compatible with them".)
>
> Section 4.1
>
> It's interesting to note that some of the subsections treat "input item
> of wrong type" as an explicit failure case (e.g., 4.1.3.1) whereas
> others implicitly assume that the input is one of the appropriate types
> (e.g., 4.1.1).
>
>    Given a structure defined in this specification, return an ASCII
>    string suitable for use in a HTTP field value.
>
> nit: I would have expected wording like "the following procedure
> returns" or "return [...] value as follows" or similar; the current
> sentence is an imperative that does not indicate how to fulfil it.
> (The same comment applies to basically all of the subsections, and a
> similar one to § 4.2 and its subsections.)
>
>    6.  Return output_string converted into an array of bytes, using
>        ASCII encoding [RFC0020].
>
> This implicitly assumes that output_string contains only characters
> representable in ASCII.  I believe this is currently true based on the
> relevant ABNF constructions and serialization procedures, but it still
> feels like a needless dangling edge case.
>
> Section 4.1.2
>
> Is the nesting level correct for these (sub)procedures?  It seems like
> top-level steps 3, 4, and 5 are logically children of step 2.
>
> Section 4.1.6
>
>    3.  Let output be an empty string.
>
>    4.  Append DQUOTE to output.
>
> Can't we consolidate steps 3 and 4 (as is done in § 4.1.1.1 where output
> is initialized to "("?
>
> Section 4.1.8
>
>    1.  If input_bytes is not a sequence of bytes, fail serialization.
>
> side note: perhaps this is just my roots as a C programmer showing
> through, but how could you end up with something that's not a sequence
> of bytes? :)
>
> Section 4.2
>
>    Strings split across multiple field lines will have unpredictable
>    results, because comma(s) and whitespace inserted upon combination
>    will become part of the string output by the parser.  Since
>    concatenation might be done by an upstream intermediary, the results
>    are not under the control of the serializer or the parser.
>
> This seems to have the implication that the results are unpredictable
> even if a serializer and parser collaborate to use a procedure that, in
> disregard of this specification, splits individual list and/or
> dictionary entries across multiple field lines, due to the potential for
> an intermediary that is not complicit in the deviation from the spec.
> In some sense, that might be the key implication of this statement, in
> which case it's surprising to not see it spelled out more clearly.
>
> Section 4.2.1.2
>
>    3.  While input_string is not empty:
>
>        1.  Discard any leading SP characters from input_string.
>
>        2.  If the first character of input_string is ")":
>
> Don't we need to check if the input_string is empty again after
> consuming leading SP but before attempting to consume anything else?
>
> Section 4.2.2
>
>    Given an ASCII string as input_string, return an ordered map whose
>    values are (item_or_inner_list, parameters) tuples. input_string is
>
> Should we say anything about the map keys?
>
> Section 4.2.3.2
>
> Why do we use a different procedure to implement the "implicit true"
> property of booleans than we did in Section 4.2.2?
>
> Section 4.2.4
>
>         4.  Otherwise, prepend char to input_string, and exit the loop.
>
> This is the only place in the document where we use "prepend" in an
> algorithm.  That might make it surprising to the reader; is it easy to
> reformulate this without greedily consuming input that does not match?
>
>         2.  If output_number is outside the range -999,999,999,999,999
>             to 999,999,999,999,999 inclusive, fail parsing.
>
> [I'm not sure how this check could trigger given the length check in 7.5]
>
> Section 4.2.7
>
>    Because some implementations of base64 do not allow reject of encoded
>    data that is not properly "=" padded (see [RFC4648], Section 3.2),
>    parsers SHOULD NOT fail when it is not present, unless they cannot be
>    configured to do so.
>
> This is a new protocol mechanism; why do we need to be so accommodating
> to inflexible implementations?
> (Also, nit: "rejection")
>
>    Because some implementations of base64 do not allow rejection of
>    encoded data that has non-zero pad bits (see [RFC4648], Section 3.5),
>    parsers SHOULD NOT fail when it is present, unless they cannot be
>    configured to do so.
>
> (ibid)
> Also, nit: singular/plural mismatch "bits"/"it is"
>
> Section 6
>
> It seems worth mentioning the handling for duplicated key names (e.g.,
> in parameters and dictionaries) w.r.t. overwrite or must-be-unique, and
> how there have been previous vulnerabilities relating to different
> implementations choosing "first one wins" vs. "last one wins".
>
> Appendix B
>
>    Implementers should note that Dictionaries and Parameters are order-
>    preserving maps.  Some fields may not convey meaning in the ordering
>    of these data types, but it should still be exposed so that
>    applications which need to use it will have it available.
>
> This far through the document I still have no idea when the order might
> actually be useful; even this text is only noting that (at least
> sometimes) the order does not convey meaning.
>
>
>
>
Received on Saturday, 6 June 2020 02:54:25 UTC