Structured Fields: whitespace in binary content

Structured Fields is in AUTH48 and we've addressed everything that's come up except for one very late entrant. I know this is very last minute, but I'm becoming convinced that this is something we should consider changing before shipping.

Background: I've written a script that validates HTTP messages in RFC XML, including Structured Fields. See:
  https://pypi.org/project/rfc-http-validate/

Applying this to our current drafts, I encountered a problem; if a header field contains binary data, it's extremely likely that it will need to wrap across multiple lines to fit into the RFC. As a reminder, such folded lines are required by HTTP to be replaced by one or more spaces in <https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#field-values>.

For example, here is the PR for the Signature draft:
  https://github.com/httpwg/http-extensions/pull/1319

At first I thought this could be addressed by an editorial note explaining that whitespace folding is different in examples. However, things like this make that unworkable:

~~~ http-message
Signature-Input: sig1=(*request-target *created host date
     cache-control x-empty-header x-example); keyid="test-key-a";
     alg=hs2019; created=1402170695; expires=1402170995
 Signature: sig1=:K2qGT5srn2OGbOIDzQ6kYT+ruaycnDAAUpKv+ePFfD0RAxn/1BUe
     Zx/Kdrq32DrfakQ6bPsvB9aqZqognNT6be4olHROIkeV879RrsrObury8L9SCEibe
    oHyqU/yCjphSmEdd7WD+zrchK57quskKwRefy2iEC5S2uAH0EPyOZKWlvbKmKu5q4
    CaB8X/I5/+HLZLGvDiezqi6/7p2Gngf5hwZ0lSdy39vyNMaaAT0tKo6nuVw0S1MVg
    1Q7MpWYZs0soHjttq0uLIA3DIbQfLiIvK6/l0BdWTU7+2uQj7lBkQAsFZHoA96ZZg
    FquQrXRlmYOh+Hx5D9fJkXcXe5tmAg==:
~~~

As you can see, whitespace in folding is semantically significant in Signature-Input (if it's lost, delimitation will be lost too), whereas it needs to be removed for Signature to contain valid binary content.

So, the obvious fix is to allow whitespace inside binary content. Delimitation won't be lost, because it's ":" on both ends. The base64 parsers I checked already swallow whitespace in input (not surprising since the motivating use case for base64 was line-wrapped MIME).

The question is whether it's too late to do this. Personally I think it's worth it; otherwise we're going to have some pretty confusing specs, and that's likely to lead to problems. Also, the delta to the spec and implementations is very small. Also, if there's some implementation lag I think that's workable, because this is less likely to be seen on the wire, and there aren't too many adopters of binary content yet.

What do folks think? I'll start a PR to show what it'd be like, but I wanted to get early impressions ASAP.

Thanks (and sorry for not seeing this earlier),

--
Mark Nottingham   https://www.mnot.net/

Received on Wednesday, 28 October 2020 08:23:30 UTC