Re: support for non-ASCII in strings, was: signatures vs sf-date from Mark Nottingham on 2022-12-03 (ietf-http-wg@w3.org from October to December 2022)

From: Mark Nottingham <mnot@mnot.net>
Date: Sat, 3 Dec 2022 15:48:09 +1100
To: Poul-Henning Kamp <phk@phk.freebsd.dk>, "Julian F. Reschke" <julian.reschke@gmx.de>
Cc: Roy Fielding <fielding@gbiv.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <53D8E497-284A-4B2C-91D8-367542AA0A7C@mnot.net>

My, you've all been busy while I slept. Catching up on this thread with my (for now, personal) thoughts.

1) We're adding Dates because it's a fairly common data type and generic software can do potentially interesting things when presenting / manipulating them. It's not a _strong_ motivation, but it seems to have got us over the wire.

2) I added %-encoded strings to Problem because the other encoding didn't fit cleanly into SF-land. However, we should _not_ add non-ASCII strings to SF because they're a footgun that for _most_ cases, will cause more trouble than they're worth.

In the protocol (not content), most strings are intended for machines, not people, and ASCII strings can be processed fairly unambiguously; that's not true when you open things up to full Unicode.

There are some cases where non-ASCII strings are needed in header fields; mostly, when you're presenting something to a human from the fields. Those cases are not as common. However, there's a catch to adding them: if full unicode strings were available in the protocol, many designers will understandably use them because it's been drilled into all our heads that unicode is what you use for strings.

Hence, footgun.

By leaving full unicode support out of the spec and forcing designers to take positive steps to support it, the (relatively small) barrier to adoption makes them stop and think whether they need it. I think that's a good thing. I also know that will make some i18n folks unhappy, and I'm sorry for that; unfortunately we're working in an area where protocol artefacts intended for humans and machines are mixed, and so it gets difficult.

All of that said, once the algorithms are stable (as Julian has pointed out, they contain some errors), I wouldn't object to including the %-encoding text as an appendix in sf-bis with appropriate warnings, if other folks are amenable.

3) Please keep the discussion courteous.

Cheers,

--
Mark Nottingham https://www.mnot.net/

Received on Saturday, 3 December 2022 04:48:47 UTC