Re: [whatwg/url] [editorial] Rephrase encoding note to make the implications clearer. (PR #804)

@tabatkins commented on this pull request.



> @@ -2038,8 +2038,9 @@ and <a>code points</a> in the range U+00A0 to U+10FFFD, inclusive, excluding <a>
 <!-- IRI also excludes the ranges U+E000 to U+F8FF, U+FFF0 to U+FFFD, and U+E0000 to U+E09FF, all
      inclusive. We don't to align with HTML. -->
 
-<p class=note>Code points greater than U+007F DELETE will be converted to
-<a lt="percent-encoded byte">percent-encoded bytes</a> by the <a>URL parser</a>.
+<p class=note>For historical reasons, rather than storing codepoints and [=byte/percent-encoding=]
+to ASCII for serialization, URLs instead store their value as ASCII internally, eagerly converting
+code points greater than U+007F DELETE to [=percent-encoded bytes=] during [=URL parser|parsing=].

Sure, the spec *could* be written another way (potentially), but it's currently *not* written that way, and the specifics of how the data is encoded/represented at this point in the spec are important, so I know that the URL structure only includes ASCII code points. If we changed to a "convert at serialization" model, that would *also* be important to note, so it was clear that the URL structure includes non-ASCII code points.

As I said, the nature of this note actively confused me - the spec talks about "URL code points" including non-ASCII codepoints, but URLs themselves do not contain these code points, and that wasn't clear to me from how the note was written.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/pull/804#discussion_r2010940667
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/pull/804/review/2711703381@github.com>

Received on Monday, 24 March 2025 21:18:59 UTC