Re: support for non-ASCII in strings, was: signatures vs sf-date from Mark Nottingham on 2022-12-03 (ietf-http-wg@w3.org from October to December 2022)

From: Mark Nottingham <mnot@mnot.net>
Date: Sat, 3 Dec 2022 19:08:08 +1100
To: "Julian F. Reschke" <julian.reschke@gmx.de>
Cc: ietf-http-wg@w3.org
Message-Id: <FCC95D8F-0F64-4245-98E5-5760AD63E8FA@mnot.net>

> On 3 Dec 2022, at 6:47 pm, Julian Reschke <julian.reschke@gmx.de> wrote:

>> 2) I added %-encoded strings to Problem because the other encoding didn't fit cleanly into SF-land. However, we should _not_ add non-ASCII strings to SF because they're a footgun that for _most_ cases, will cause more trouble than they're worth.
>> 
>> In the protocol (not content), most strings are intended for machines, not people, and ASCII strings can be processed fairly unambiguously; that's not true when you open things up to full Unicode.
> 
> We're discussing this for header fields that *do* carry
> human-presentable content. Content-Disposition, Link, and now Problem.

Yes

> If you're serious about human presentable text not belonging here, why
> do we add that to "Problem" right now?

Please re-read what I wrote. Discussing them does not obviate what I said.

>> There are some cases where non-ASCII strings are needed in header fields; mostly, when you're presenting something to a human from the fields. Those cases are not as common. However, there's a catch to adding them: if full unicode strings were available in the protocol, many designers will understandably use them because it's been drilled into all our heads that unicode is what you use for strings.
>> 
>> Hence, footgun.
> 
> I would appreciate if you would explain why there is a problem we need
> to prevent, and what exactly that problem is. Do you have an example?

As you've pointed out, the scope for this bis document was tightly defined. The onus isn't on me to prove what shouldn't go into it...

>> By leaving full unicode support out of the spec and forcing designers to take positive steps to support it, the (relatively small) barrier to adoption makes them stop and think whether they need it. I think that's a good thing. I also know that will make some i18n folks unhappy, and I'm sorry for that; unfortunately we're working in an area where protocol artefacts intended for humans and machines are mixed, and so it gets difficult.
> 
> I continue to disagree. By not supporting non-ASCII in the base
> definition, we force people to come up with ad hoc definitions which in
> general will be worse than a common extension we can define here.
> 
>> All of that said, once the algorithms are stable (as Julian has pointed out, they contain some errors), I wouldn't object to including the %-encoding text as an appendix in sf-bis with appropriate warnings, if other folks are amenable.
> 
> That would be a good step into the right direction. I still think we
> need an on-the-wire signal that the encoding is in place, for the same
> reasons why we're doing this revision in the first place (tooling
> support for special-casing integers that happen to represent dates).

I disagree, and you should have brought that up in the scoping discussion.

Cheers,


--
Mark Nottingham   https://www.mnot.net/

Received on Saturday, 3 December 2022 08:08:47 UTC