[whatwg/url] IPv4 parser "invents" a valid ip from doi prefix; is this / should this be correct behaviour? (Issue #761)

I'm not sure whether this is a spec issue or a node implementation issue but the results is the same in node and the browser. Basically the new standard converts the non-ip 10.1000 into the valid IP 10.0.3.232

In the old standard it is left as is. 

As part of our tests we have this bad input we want to reject: 

`https://10.1000/f<script>alert(1);</script>`
(This is supposed to represent a valid yet potentially harmful DOI prepended by the wrong protocol)

In browser:
https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly8xMC4xMDAwL2Y8c2NyaXB0PmFsZXJ0KDEpOzwvc2NyaXB0Pg==&base=YWJvdXQ6Ymxhbms=

In Node: 

Both url.parse and new URL are fine with this as an origin, which is fine, we deal with that downstream. What's odd is how it's parsed:

With the updated api it "invents" an IP address for it, so the output is 
```
URL {
  href: 'https://10.0.3.232/f%3Cscript%3Ealert(1);%3C/script%3E',
  origin: 'https://10.0.3.232',
  protocol: 'https:',
  username: '',
  password: '',
  host: '10.0.3.232',
  hostname: '10.0.3.232',
  port: '',
  pathname: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  search: '',
  searchParams: URLSearchParams {},
  hash: ''
}
```
The old url.parse output keeps the pseudo IP address as it is -> 
```
Url {
  protocol: 'https:',
  slashes: true,
  auth: null,
  host: '10.1000',
  port: null,
  hostname: '10.1000',
  hash: null,
  search: null,
  query: null,
  pathname: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  path: '/f%3Cscript%3Ealert(1);%3C/script%3E',
  href: 'https://10.1000/f%3Cscript%3Ealert(1);%3C/script%3E'
}
```

Should it actually be doing this? Obviously dois shouldn't be prepended with the wrong protocol, but wondering if instead the behaviour should be to throw an error instead of accepting it but converting it to something unrecognisable? 

I know consecutive 0s are allowed to be omitted, but 10.0.0.1000 isn't valid either because there are 4 digits in the last place (the minimum length for the second number in doi prefixes precisely to prevent confusion with ip addresses)... anyone know what's happening with the parser :)?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/761
You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/761@github.com>

Received on Monday, 13 March 2023 14:31:50 UTC