Re: [whatwg/url] How should parser handle percent-encoded characters like `%66` U+0066 (f) in path segments? (#565) from Timothy Gu on 2021-01-01 (public-webapps-github@w3.org from January 2021)

From: Timothy Gu <notifications@github.com>
Date: Thu, 31 Dec 2020 16:31:33 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/565/753231942@github.com>

So the case of interest is https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly9leGFtcGxlLmNvbS8lNjYlNmYlNmY=&base=YWJvdXQ6Ymxhbms=.

* Chromium percent-decodes the path element when parsing, yielding `https://example.com/foo` as the serialized result.
* Others do not, yielding `https://example.com/%66%6f%6f` when re-serialized. A list of implementations with this behavior:
  * This spec
  * Firefox
  * WebKit
  * Go's [net/url](https://golang.org/pkg/net/url/) percent-decodes the `Path` field, but `RawPath`, `EscapedPath()`, and the serialized form through `String()` all retain the original percent-encoded form.
  * Java's [java.net.URI](https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/net/URI.html) is the same as Go
  * [Legacy Node.js URL parser](https://nodejs.org/dist/latest-v15.x/docs/api/url.html#url_legacy_url_api)
  * Python's [urllib.parse](https://docs.python.org/3/library/urllib.parse.html#module-urllib.parse)
  * Ruby's [URI](https://ruby-doc.org/stdlib-2.7.2/libdoc/uri/rdoc/URI.html)
  * Rust's [url.Url](https://docs.rs/url/2.2.0/url/struct.Url.html)

With the exception of Chromium, it seems like the current spec describes the consensus behavior.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/565#issuecomment-753231942

Received on Friday, 1 January 2021 00:31:45 UTC