Re: [whatwg/url] Malformed URL Normalization in Standard Introduces SSRF Risks (Issue #893)

the-moisrex left a comment (whatwg/url#893)

I do agree with the idea of strict mode. Some parts of the specs have this mode too, but it's not restrictive enough.

And double solidus problem in my opinion is not that big of deal as other problems. These things come to my mind, but I'm sure there are more places where we can assert a more restrictive option to the specs:


- **New lines and tabs** (what a dumb thing to remove them, seriously)
- **Octal and HEX IPv4**
- **Leading zeros** and what not in IPs (We seriously can have infinite ways of writing any IP address, this is not okay, we can smuggle large streams of strings that are still valid IP addresses)
- **Invalid Unicode code points**
- Not **Verifying DNS Length** by default
- **Trailing empty IPv4 Octets**
- **Empty label** in domains
- **Percent Encoded dots in paths**
- **Long punycode** encoded domains can pretty much be a DDoS attack or resource draining attacks since punycode is very slow


I think we need a restrictive strict mode.

There are just so many random things in URL Specs.

For example:

Valid URLs:
```
file:
fILe://C|\\//\////\\\\/\/\/\\\/\
any-random-scheme:
```

Invalid URLs:
```
http:
https:
ws:
wss:
ftp:
```
---

This is a valid WHATWG URL with a hidden "Hello World" binary message encoded within:

- / is 1
- \ is 0

https:/\\//\\////\\\\//\\/\\/\\\\/\\\\///\\\\/\\\\///\\\\/\\\\\\\\//\\//////\\/\\/\\\\\\/\\\\/\\\\\\\\/\\\\\\//\\//\\\\/\\\\///\\\\//\\//example.com

---

- `file:::::::` totally valid URL
- `http::`            hold on, that's not valid at all!

---

These two URLs result in different strings generated!

```
file://localhost/path/to/somethere
file://127.0.0.1/path/to/somewhere
```

https://url.spec.whatwg.org/#file-host-state

---

Parsing IPv4 in a host of a URL requires different algorithm than the good old inet_pton4. This means, all of these are invalid in inet_pton4, but valid in the URLs:

```
"0000000.0.0.0"
"0x000000.0.0.0"
"0X0000.0x.0.0"
"0"
"0x"
"0X
"00000"
"0x0000"
""
"."
"127.0.0.1"
"0x7f.1"
"0x7f000001"
"0x0000000007f.0x1"
"127.0.0x0.1"
"127.0x0.0x0.1"
"127.0x0.0x0.0x1"
"127.0.0x0.0x0001"
"0x7f.0x0000001"
"0x0007f.0x0001"
"0x007f.0.0x00001"
"0x7f.0.0.0x1"
"0x7f.0.0x000.0x1"
"2130706433"
"127.1"
"127.0x00.1"
"127.0x000000000000000.0.1"
"000123"
"0xff"
"1.256"
```

----

All of these URLs refer to the same place:

```
http://127.0.0.1/
http://0x7F.1/
http://0x7F000001
http://0x0000000007F.0X1
http://127.0.0x0.1
http://127.0X0.0x0.1
http://127.0X0.0x0.0x1
http://127.0.0x0.0x0000000000000000000000000000000000000000000000000000000000000001
http://0x7F.0x00000000000000000000000001
http://0x000000000000000007F.0x00000000000000000000000001
http://0x000000000000000007F.0.0x00000000000000000000000001
http://0x7F.0.0.0x1
http://0x7F.0.0x000.0x1
http://2130706433
http://127.1
http://127.0x00.1
http://127.0x000000000000000.0.1
```

😐😐😐😐😐

---
I mean, WTF?

```
file://C|\windows
file:///C:/windows
```

---

Browsers make mistakes too

There are a few bugs with the current implementation of URL Parser for firefox. (Chromium has them too, but that's another bug).


This URL:
```
http://127.0.0.1/..//./one/%2E./%2e/two/././././%2e/%2e/.././three/four/%2e%2e/five/.%2E/%2e
```

Should be equal to this URL (from ada-url (https://www.ada-url.com/playground?url=http%3A%2F%2F127.0.0.1%2F..%2F%2F.%2Fone%2F%252E.%2F%252e%2Ftwo%2F.%2F.%2F.%2F.%2F%252e%2F%252e%2F..%2F.%2Fthree%2Ffour%2F%252e%252e%2Ffive%2F.%252E%2F%252e)):

```
http://127.0.0.1//three/
     | |        |       
     | |        `------- pathname_start 16
     | |        `------- host_end 16
     | `---------------- host_start 7
     | `---------------- username_end 7
     `------------------ protocol_end 5
```

It seems like URL parser of Firefox has a bug which doesn't decode the last segment of the path for some reason.

For example, `http://localhost/page/%2e%2e` is just `http://localhost/`


---


These are some of the things I've noticed before and have written about them on my project's telegram channels and copied them here.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/893#issuecomment-3707855878
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/893/3707855878@github.com>

Received on Sunday, 4 January 2026 08:17:46 UTC