[whatwg/url] Base URL Windows drive sometimes favoured over input string drive (#574)

Typically, the URL parser detects Windows drive letters in the parsed path's "effective first component" (term I just made up).
The path parser processes each component in turn, and pushing and popping each component as they are encountered, with normalisation checking that the existing path is empty and the shortening function checking each time if the component at `path[0]` is a Windows drive letter so as not to pop it. This leads to Windows drive letters getting detected even if there is leading rubbish in front of them, for example ([Live viewer](https://jsdom.github.io/whatwg-url/#url=ZmlsZTovLy9hYmMvZGVmLy4uLy4uL0N8Ly4uL2hlbGxvLw==&base=YWJvdXQ6Ymxhbms=)):

```
// (input, base) => result
("file:///abc/def/../../C|/../hello/", "about:blank") => "file:///C:/hello/"
```
The `C|` component gets normalised in to `C:` and is not popped from the resulting path, because it is detected as a Windows drive letter.

However, there is a very subtle quirk: if the input string does not have an authority (`/abc` or `file:/abc`), and the base URL also contains a drive letter, the process of detecting drive letters in the input string gets stricter, so the drive must be the _literal_ first component, without any leading rubbish. Otherwise the drive from the base URL gets preferred.

Consider the following `(input, base)` pairs:

```
// D drive is literally the first component.
("file:/D|/../foo", "file:///C:/base1/base2/") => "file:///D:/foo"

// Add a single-dot component before the input drive.
// Even though this doesn't affect the final path, it makes us favour the base's drive.
("file:/./D|/../foo", "file:///C:/base1/base2/") => "file:///C:/foo"

// This doesn't happen if input string has an authority.
("file:///./D|/../foo", "file:///C:/base1/base2/") => "file:///D:/foo"

// Going back to no-authority, the drive in the input string
// is still recognised if base _doesn't_ have its own drive.
("file:/./D|/../foo", "file:///Cx/base1/base2/") => "file:///D:/foo"
```

As you can see, in most cases we take the drive "D" from the input string, even when it isn't literally the first component, except for this one case.

This doesn't seem to be exercised by any existing tests, and given that it is such an edge-case I'd like to check whether it is intentional or not. I'm using a parsing algorithm derived from the one in the standard, and I was able to pass the all of the constructor and setter tests while handling this incorrectly. So if this is intentional behaviour, I'd be happy to add some tests for it.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/574

Received on Saturday, 23 January 2021 00:58:22 UTC