- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Sun, 28 May 2023 16:57:12 +0900
- To: Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP Working Group <ietf-http-wg@w3.org>
Hello Ilari, others, On 2023-05-26 18:52, Ilari Liusvaara wrote: > On Thu, May 25, 2023 at 10:21:34AM -0700, Roy T. Fielding wrote: >> >> If this is truly for a display string, the feature must be >> specific about the encoding and allowed characters. >> My suggestion would be to limit the string to non-CNTRL >> ASCII and non-control valid UTF-8. We don't want to allow >> anything that would twist the feature to some other ends. > > I think the set of allowed characters should be the 1,111,999 non-Cc > unicode codepoints. > > However, unicode also has formatting control codepoints (including > fun ones like direction overrides), and the set of those is not > necressarily stable. Obviously, the effect of any formatting control > should end with the string. Bidirectional formatting characters should best be left in, because they may be needed in display strings in Arabic, Hebrew, or other right-to-left scripts. > I think it would be safer to add exactly one backslash escape sequence > for the 1,111,904 codepoints that are neither Cc nor ASCII. The > escape sequences should only consist of printable ASCII and should not > contain further backslash nor dobule quote. > > It is possible to assign the escape sequences such that worst case > overhead over UTF-8 is 1 byte per codepoint. It sounds to me as if you are trying to invent a new form of escaping (or encoding). If you really think that's the direction we should move in, can you be a bit more specific (maybe with a few examples)? Regards, Martin.
Received on Sunday, 28 May 2023 07:57:20 UTC