- From: Tab Atkins Jr. <notifications@github.com>
- Date: Mon, 24 Mar 2025 14:18:55 -0700
- To: whatwg/url <url@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <whatwg/url/pull/804/review/2711703381@github.com>
@tabatkins commented on this pull request. > @@ -2038,8 +2038,9 @@ and <a>code points</a> in the range U+00A0 to U+10FFFD, inclusive, excluding <a> <!-- IRI also excludes the ranges U+E000 to U+F8FF, U+FFF0 to U+FFFD, and U+E0000 to U+E09FF, all inclusive. We don't to align with HTML. --> -<p class=note>Code points greater than U+007F DELETE will be converted to -<a lt="percent-encoded byte">percent-encoded bytes</a> by the <a>URL parser</a>. +<p class=note>For historical reasons, rather than storing codepoints and [=byte/percent-encoding=] +to ASCII for serialization, URLs instead store their value as ASCII internally, eagerly converting +code points greater than U+007F DELETE to [=percent-encoded bytes=] during [=URL parser|parsing=]. Sure, the spec *could* be written another way (potentially), but it's currently *not* written that way, and the specifics of how the data is encoded/represented at this point in the spec are important, so I know that the URL structure only includes ASCII code points. If we changed to a "convert at serialization" model, that would *also* be important to note, so it was clear that the URL structure includes non-ASCII code points. As I said, the nature of this note actively confused me - the spec talks about "URL code points" including non-ASCII codepoints, but URLs themselves do not contain these code points, and that wasn't clear to me from how the note was written. -- Reply to this email directly or view it on GitHub: https://github.com/whatwg/url/pull/804#discussion_r2010940667 You are receiving this because you are subscribed to this thread. Message ID: <whatwg/url/pull/804/review/2711703381@github.com>
Received on Monday, 24 March 2025 21:18:59 UTC