Re: support for non-ASCII in strings, was: signatures vs sf-date

On 03.12.2022 05:48, Mark Nottingham wrote:
> My, you've all been busy while I slept. Catching up on this thread with my (for now, personal) thoughts.
>
> 1) We're adding Dates because it's a fairly common data type and generic software can do potentially interesting things when presenting / manipulating them. It's not a _strong_ motivation, but it seems to have got us over the wire.

That's my understanding as well.

And I argue that for the same reason, a common and
on-the-wire-detectable encoding of non-ASCII characters would be good.

> 2) I added %-encoded strings to Problem because the other encoding didn't fit cleanly into SF-land. However, we should _not_ add non-ASCII strings to SF because they're a footgun that for _most_ cases, will cause more trouble than they're worth.
>
> In the protocol (not content), most strings are intended for machines, not people, and ASCII strings can be processed fairly unambiguously; that's not true when you open things up to full Unicode.

We're discussing this for header fields that *do* carry
human-presentable content. Content-Disposition, Link, and now Problem.

If you're serious about human presentable text not belonging here, why
do we add that to "Problem" right now?

> There are some cases where non-ASCII strings are needed in header fields; mostly, when you're presenting something to a human from the fields. Those cases are not as common. However, there's a catch to adding them: if full unicode strings were available in the protocol, many designers will understandably use them because it's been drilled into all our heads that unicode is what you use for strings.
>
> Hence, footgun.

I would appreciate if you would explain why there is a problem we need
to prevent, and what exactly that problem is. Do you have an example?

> By leaving full unicode support out of the spec and forcing designers to take positive steps to support it, the (relatively small) barrier to adoption makes them stop and think whether they need it. I think that's a good thing. I also know that will make some i18n folks unhappy, and I'm sorry for that; unfortunately we're working in an area where protocol artefacts intended for humans and machines are mixed, and so it gets difficult.

I continue to disagree. By not supporting non-ASCII in the base
definition, we force people to come up with ad hoc definitions which in
general will be worse than a common extension we can define here.

> All of that said, once the algorithms are stable (as Julian has pointed out, they contain some errors), I wouldn't object to including the %-encoding text as an appendix in sf-bis with appropriate warnings, if other folks are amenable.

That would be a good step into the right direction. I still think we
need an on-the-wire signal that the encoding is in place, for the same
reasons why we're doing this revision in the first place (tooling
support for special-casing integers that happen to represent dates).

> ...

Best regards, Julian

Received on Saturday, 3 December 2022 07:47:24 UTC