Hi Julian,

On Sat, May 27, 2023 at 11:55:59AM +0200, Julian Reschke wrote:
> On 27.05.2023 10:37, Willy Tarreau wrote:
> Without having read all details:
> +1 to consider (!) just using raw octets
> +1 not to use sf-binary
> +1 to exclude ASCII controls (but not entirely sure about CR LF HTAB)
> -1 to use anything but UTF-8 (I fail to see any reason for that) - and
> no, use of UTF-8 does require revising things when Unicode code points
> are added

Unless I'm totally mistaken, the maximum sequence length has increased
over time to support new code points. I remember having myself implemented
decoding functions a long time ago in a security component where we were
required to fail past 4 or maybe 5 bytes, and that I later learned that
they had to extend it by one or two bytes to support new code points. I
don't remember the exact details but my point is that we must not impose
this absurdly insecure decoding to infrastructure components, or they
will regularly be accusated of blocking valid contents :-/  As long as
they can pass it as-is and it's the recipient's goal to figure if they
successfully decode or not, that's fine by me.


