- From: Matt Giuca <notifications@github.com>
- Date: Thu, 14 May 2020 21:54:57 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/pull/518/review/412346390@github.com>
@mgiuca commented on this pull request. > + <var>encoding</var>. + + <li> + <p>If <var>bytes</var> starts with 0x26 (&) 0x23 (#) and ends with 0x3B (;), then: + + <ol> + <li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>. + + <li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>". + + <li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>". + + <li><p>Return <var>output</var>. + </ol> + + <p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>. <li><p>Let <var>output</var> be the empty string.</p></li> True, anything that's "not an ASCII byte" is always going to be in every percent-encode set (because the C0 control set includes "all code points greater than U+007E (~)."). It's a little dubious to rely on this, however, especially since if we removed that check, we'd be doing these weird comparisons between non-ASCII bytes and non-ASCII code points. For example, if UTF-8 encoder gives us a byte 0xD2, we would be consulting the percent-encode set for the character U+00D2, which will "work", but it has nothing to do with that particular code point. So I put the ASCII byte check, so that then we straight-up guarantee that any non-ASCII byte will be encoded without being converted to an unrelated code point. It might also be worth adding a note next to the percent-encode sets to say that the encoding algorithm assumes (either way we decide to do this) that all non-ASCII bytes are in every percent-encode set. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/pull/518#discussion_r425565378
Received on Friday, 15 May 2020 04:55:10 UTC