Re: [whatwg/url] Editorial: make everything use percent-encode sets (#518) from Matt Giuca on 2020-05-15 (public-webapps-github@w3.org from May 2020)

From: Matt Giuca <notifications@github.com>
Date: Thu, 14 May 2020 21:54:57 -0700
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/pull/518/review/412346390@github.com>

@mgiuca commented on this pull request.



> + <var>encoding</var>.
+
+ <li>
+  <p>If <var>bytes</var> starts with 0x26 (&amp;) 0x23 (#) and ends with 0x3B (;), then:
+
+  <ol>
+   <li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>.
+
+   <li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>".
+
+   <li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>".
+
+   <li><p>Return <var>output</var>.
+  </ol>
+
+  <p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
 
  <li><p>Let <var>output</var> be the empty string.</p></li>
 

True, anything that's "not an ASCII byte" is always going to be in every percent-encode set (because the C0 control set includes "all code points greater than U+007E (~).").

It's a little dubious to rely on this, however, especially since if we removed that check, we'd be doing these weird comparisons between non-ASCII bytes and non-ASCII code points. For example, if UTF-8 encoder gives us a byte 0xD2, we would be consulting the percent-encode set for the character U+00D2, which will "work", but it has nothing to do with that particular code point. So I put the ASCII byte check, so that then we straight-up guarantee that any non-ASCII byte will be encoded without being converted to an unrelated code point.

It might also be worth adding a note next to the percent-encode sets to say that the encoding algorithm assumes (either way we decide to do this) that all non-ASCII bytes are in every percent-encode set.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/518#discussion_r425565378

Received on Friday, 15 May 2020 04:55:10 UTC