[whatwg/url] An opaque-host parser and percent encoding (Issue #806)

### What is the issue with the URL Standard?

It seems WPT URL tests have the following [url test data](https://github.com/web-platform-tests/wpt/blob/master/url/resources/urltestdata.json):

```
  {
    "input": "sc://%/",
    "base": null,
    "href": "sc://%/",
    "protocol": "sc:",
    "username": "",
    "password": "",
    "host": "%",
    "hostname": "%",
    "port": "",
    "pathname": "/",
    "search": "",
    "hash": ""
  },
```

```

  {
    "input": "foo://!\"$%&'()*+,-.;=_`{}~/",
    "base": null,
    "hash": "",
    "host": "!\"$%&'()*+,-.;=_`{}~",
    "hostname": "!\"$%&'()*+,-.;=_`{}~",
    "href":"foo://!\"$%&'()*+,-.;=_`{}~/",
    "origin": "null",
    "password": "",
    "pathname": "/",
    "port":"",
    "protocol": "foo:",
    "search": "",
    "username": ""
  },
```

It appears the WPT URL tests assume either ```"foo://!"$%&'()*+,-.;=_`{}~/"``` or `"sc://%"` is a valid URL.

However, a step 3 in [opaque-host parser](https://url.spec.whatwg.org/#concept-opaque-host-parser) says:

> If input contains a U+0025 (%) and the two code points following it are not ASCII hex digits, invalid-URL-unit validation error.

According to this definition, ```"foo://!"$%&'()*+,-.;=_`{}~/"``` seems invalid because two code points `"&'"`, which are not ASCII hex digits, follow U+0025 (%).
```"sc://%"``` is probably invalid too, though I'm unsure.

Is "step 3" an intended behavior?

The context: I've fond this while supporting non-special URLs in chromium ([crbug.com/1416006](https://crbug.com/1416006)).


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/806
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/806@github.com>

Received on Wednesday, 6 December 2023 05:26:59 UTC