Re: draft-ietf-httpbis-header-structure: handling multiple field values

On 12.05.2020 20:45, Ian Clelland wrote:
>
>
> On Tue, May 12, 2020 at 1:47 PM Julian Reschke <julian.reschke@gmx.de
> <mailto:julian.reschke@gmx.de>> wrote:
>
>     On 12.05.2020 19:39, Ian Clelland wrote:
>      > This is mentioned in
>      >
>     https://httpwg.org/http-extensions/draft-ietf-httpbis-header-structure.html#rfc.section.4.2 --
>      > "parsers MUST combine all lines in the same section (header or
>     trailer)
>      > that case-insensitively match the field name into one comma-separated
>      > field-value", (with the warning given that strings split across
>     multiple
>      > field values will have "unpredictable results") -- So I don't think
>      > you're allowed to parse them separately. If both exist in the same
>      > message, they must be combined before parsing.
>      > ...
>
>     Indeed. Looking at this again, I realize that a paragraph below then
>     confused me:
>
>     "Strings split across multiple field lines will have unpredictable
>     results, because comma(s) and whitespace inserted upon combination will
>     become part of the string output by the parser. Since concatenation
>     might be done by an upstream intermediary, the results are not under the
>     control of the serializer or the parser."
>
>     I read this to mean that errors might be detected early or not, but
>     maybe this is just a warning that the actual string used for
>     concatenation can vary?
>
>     If that's the intent, I'd call that a spec bug. A string value split
>     across multiple field instances is very clearly a violation of what HTTP
>     says about list-shaped header fields, and not allowing a recipient to
>     detect that seems incorrect to me.
>
>
> Definitely a spec bug -- not sure which spec though.
> 7230 reads:
>
>     A sender MUST NOT generate multiple header fields with the same
>     field name in a message unless either the entire field value for
>     that header field is defined as a comma-separated list [i.e.,
>     #(values)] or the header field is a well-known exception (as noted
>     below).
>
>
> Perhaps what it should also mention is that the header must be defined
> as a comma-separated list, *and* the split must be between list
> elements, in cases where the field value can contain commas with other
> semantic meanings.

AFAIU, that was the intent in RFC 2616 and 7230: every single field
value must conform to the header field's grammar.

> It goes on to say:
>
>     A recipient MAY combine multiple header fields with the same
>     field name into one "field-name: field-value" pair, without changing
>     the semantics of the message, by appending each subsequent field
>     value to the combined field value in order, separated by a comma.
>
>
> and maybe the phrase "without changing the semantics of the message"
> means that the server is only free to join the fields if it doesn't
> change the semantics (implying indirectly that the field shouldn't have
> been split up within a quoted string in the first place), but it doesn't
> really read that way.

No, whoever joins the header fields does not need to know the syntax of
the field (because that would defeat extensibility). IOW, if the input
is garbage, so will be the output.

Going back to the SH spec: I'm afraid that the spec *disallows* to fail
early on garbage - is this *really* the intent?

Best regards, Julian

Received on Tuesday, 12 May 2020 19:26:18 UTC