Re: Consensus call to include Display Strings in draft-ietf-httpbis-sfbis from Ilari Liusvaara on 2023-05-26 (ietf-http-wg@w3.org from April to June 2023)

From: Ilari Liusvaara <ilariliusvaara@welho.com>
Date: Fri, 26 May 2023 12:52:31 +0300
To: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <ZHCBX2392L+0pSYK@LK-Perkele-VII2.locald>

On Thu, May 25, 2023 at 10:21:34AM -0700, Roy T. Fielding wrote:
> 
> If this is truly for a display string, the feature must be
> specific about the encoding and allowed characters.
> My suggestion would be to limit the string to non-CNTRL
> ASCII and non-control valid UTF-8. We don't want to allow
> anything that would twist the feature to some other ends.

I think the set of allowed characters should be the 1,111,999 non-Cc
unicode codepoints.

However, unicode also has formatting control codepoints (including
fun ones like direction overrides), and the set of those is not
necressarily stable. Obviously, the effect of any formatting control
should end with the string.

> Assuming we do this with pct-encoding, we should not allow
> arbitrary octets to be encoded. We should disallow encodings
> that are unnecessary (normal printable ASCII aside from % and "),
> control characters, or octets not valid for UTF-8. That can
> be specified by prose and reference to the IETF specs, or
> we could specify the allowed ranges with a regular expression.
> Either one is better than allowing arbitrary octets to be encoded.

I think it would be safer to add exactly one backslash escape sequence
for the 1,111,904 codepoints that are neither Cc nor ASCII. The
escape sequences should only consist of printable ASCII and should not
contain further backslash nor dobule quote.

It is possible to assign the escape sequences such that worst case
overhead over UTF-8 is 1 byte per codepoint.

-Ilari

Received on Friday, 26 May 2023 09:52:40 UTC