Re: [whatwg/url] Need an "unreserved" character set (and better define how to percent-encode arbitrary strings) (#369)

> 3986 seems safest.

The only problem is (as I said in my initial "essay"), that many "arbitrary string encoders" (including encodeURIComponent and both Chrome and Firefox' implementations of registerProtocolHandler) use the RFC 2396 set. I think we can work with either set, since the delta between them (`!*'()`) are non-syntactic characters which it shouldn't matter whether we treat them as reserved or unreserved.

(Note that the default-encode set *should* include all reserved characters, but it's OK for it to be a superset of the reserved characters, and thus unnecessarily encode some unreserved characters. So I think it's safer actually to have a larger unreserved set from 2396.)

> I just realized that the problems you allude to though will continue to exist for non-ASCII data, which is why I think I gave up on pursuing something grander here since the producer and consumer will need to have some agreement at some level anyway.

Actually, non-ASCII data is a non-issue. Both the current URL Standard, and RFC 3987, treats any non-ASCII character equivalently to its encoded form (by virtue of normalizing them to percent-encoded form). The same is true of all characters in the C0 control set, which are normalized to encoded form.

Any character that is normalized *either* to encoded or non-encoded form does not trigger any of the above issues. It doesn't matter if such a character is rendered encoded or non-encoded, because it has the same meaning. It doesn't matter if such a character is encoded by an "arbitrary string encoder" or not, because it has the same meaning.

So as far as I can tell, this whole issue revolves around ASCII characters outside of the C0 control set, which are not normalized one way or the other.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/369#issuecomment-359707743

Received on Tuesday, 23 January 2018 08:04:57 UTC