Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis from Julian Reschke on 2023-05-26 (ietf-http-wg@w3.org from April to June 2023)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 26 May 2023 07:21:53 +0200
To: ietf-http-wg@w3.org
Message-ID: <5a704134-ce9c-2201-62ff-3a70ba6ac775@gmx.de>
On 26.05.2023 00:23, Poul-Henning Kamp wrote:
> --------
> Roy T. Fielding writes:
>
>> I think this would have been better in parts, namely
>
> Agreed.

I agree partly; I think Mark went ahead with a concrete proposal so that
this can be done quickly. It's clear that there are many ways to do
this, and I'm pretty sure that it'll be very hard to agree on the best one.

At the end of the day what matters is that we have that capability, as
opposed to not having at all.

>> My suggestion would be to limit the string to non-CNTRL
>> ASCII and non-control valid UTF-8. We don't want to allow
>> anything that would twist the feature to some other ends.
>> [...]
>> Note that I am not saying that we should consider normalization
>> or any other weirdness specific to Unicode.
>
> Each new version of UniCode adds new code points, and they decided
> up front that UniCode sequences would not be versioned.
>
> Instead they issued guidance, and I'm paraphrasing here: "If you
> receive a code-point you dont recognize, assume the sender has a
> new version of UniCode than you do and display something safe and
> distinct."

How exactly does that matter for the discussion we are having here?

> I have also never seen a document where UniCode clearly and
> definitive promise to never add further control characters.
>
> So checking that you have "non-control valid UTF-8" is always going
> to require a (moderately) up-to-date representation of which unicode
> codepoints are valid and which of those are controls.

Yes, that would need to be clarified; I believe Roy refers to a
definition of controls that is fixed.

> Why would we inflict that burden at the HTTP level ?
>
>> We just need to stay within the confines of what has already
>> been defined as valid and safe UTF-8.
>
> Do you have a specific document in mind here ?
>
>> In general, it is safer to send raw UTF-8 over the wire in HTTP
>> than it is to send arbitrary pct-encoded octets, simply because
>> pct-encoding is going to bypass most security checks long enough
>> for the data to reach an applications where people do stupid
>> things with strings that they assume contain something that is
>> safe to display.
>
> This is precisely why I think we should /never/ employ pct-encoding
> in HTTP headers.

But we do already. Also, the argument that security checks can be
bypassed applies to sf-binary as well.

> Given that HTTP is increasingly being treated as a transport protocol,
> (not that I agree with that either,) I think it is a much safer
> approach to handle UTF8 as opaque binary data at the HTTP level,
> and transfer it as such, in sf-binary fields.
>
>> Everything else is being
>> actively targeted by pentesters and script kiddies, on every
>> public server on the Internet, to the point where we have to
>> block it within CDN configurations just to avoid overloading
>> the origin servers.
>
> 100% agreement: The only thing DisplayString offers over sf-binary,
> is increased risk.

No, it offers a way to label Unicode data as such (without requiring
out-of-band knowledge).

That's *exactly* the same reason why we are adding sf-date. If that's a
concern for you, why didn't you argue against the introduction of
sf-date as well? After all, it does not add any value over sf-integer
except for inlining the type information.

Best regards, Julian
Received on Friday, 26 May 2023 05:22:00 UTC