Re: [whatwg/url] Support relative URLs (#531)

Alright, I am going to do my best to answer all the questions, so that I don't leave you hanging like this.

> I'm starting to think that would work best as a separate document. It would need a reference implementation and its own comprehensive test-suite; essentially being another URL standard. 

Apart from the separate document, I agree with this. Therefore, I have written a new specification very carefully, so that it agrees with the current standard, and I have created a reference implementation. The reference implementation currently passes all of the wpt tests except for 6 IDNA related ones, because I have not implemented domainToASCII properly. 

There is room for improvement. The wpt test suite is indeed lacking. Me and others have found differences that were not caught by the tests, though they have been easy to fix so far. 

There are no tests yet for relative references, nor an API, because the API design is not done yet. 

Fuzz testing would be great, but I am just one man, and I have not prioritised this.

> What's more is that I'm not sure how many implementations would actually want this. I'm not convinced the use-cases are entirely clear.

I think this has already been partly addressed. Note though that this is not just about relative URLs, but about recovering the proper structural framework that the WHATWG has abandoned. This is one and the same problem. It helps you solve the political one too.

> In any case, relative URLs don't come in to HTTP routing whatsoever.

Semantically these are a subclass of relative URLs that do not have a scheme, nor a host. It will be possible to represent them with the API. They have to be prefixed with a `.` when serialising to a relative URL. Similar things are currently done in the standard.

This brings up more arguments. HTTP uses a specific subclass of URI. The WHATWG standard can generate URL strings that are invalid URIs, thus browsers presumaby are able to make invalid HTTP requests. In addition, the HTTP spec requires percent decoding of unreserved characters, which is not covered by the WHATWG. For this it is important to accurately describe the differences between the WHATWG and the RFCs - which was one of my main motivations - and to provide more advanced tools for normalising URLs in different ways.

> There may be use-cases which require manipulating a relative reference in a scheme, host and base path-independent context, but I haven't seen the long list of convincing use-cases such a large change to the standard would require.

I can agree with this in the sense that relative URLs require a massive change that is very difficult to pull off. However, relative URLs alone are only an expression of much bigger problems with the stadard.

> We are nowhere near the proposal stage, IMO.

Again, I agree with this. I used the word proposal in a confusing way. What I meant was that I wanted to make a first pass at an API around my more low level implementation, and show it here so that we could together investigate ideas. I did not mean to suggest that we would already start changing the standard text, far from it.

> But does that require the full complexity of the URL parser and all the work that you've done to establish a formal grammar and theory of URLs

Not the WHATWG parser! Otherwise, Yes it does. URLs are too complex to manipulate without a theory to back it up. It is important to do it well, so that relative references that are created by software are aligned, to prevent fragmentation. It helps avoid bugs with percent coding. As a bonus, using API for this is more ergonomic than cutting and pasting strings. We have (hopefully) stopped doing that with SQL and we should stop doing that with URLs too. And again, relative URLs are related to solving much larger problems with the standard.

> I think it's important to remember that the URL parser in this standard does not represent the cleanest definition of URLs. 

I replied to this already. My specification shows that whatwg URLs and their hacks can be specified very cleanly. 

> For new APIs which don't have any of those compatibility concerns, I think we should be striving for the simplest design that solves the problems we actually have. Propagating those hacks beyond what is needed for compatibility should not be goal IMO. 

I agree somewhat, except it turns out that the hacks are not so bad -- apart from possibly the strange behaviour of setters, but those are not hard to characterise.

> But instead, you're just coming and saying you've solved all the problems on your own, in a purely academic exercise 

I have worked on this for five years and yes, I solved most all of the major problems on my own. I very carefully and foolishly considered every possible use case and edge case of your standard. Not an academic exercise. The 'theory' is a by-product of the library.

A few people have read my specification and their comments have helped me **a lot**. @zamfofex has pretty much solved the last problems for me.

> that has not been used in production and is apparently not even driven by specific problems encountered in a real application (or was it?). 

Not used in production, I guess. Used in every day programming and private projects. The whatwg API never addresses my use case.

There is at least one API wrapper around my implementation already. It is here: [astro-community]. This may be something to be proud of maybe, because the author appears to be an influential person. It shows that I anticipated the use cases correctly.

[astro-community]: https://github.com/astro-community/relative-url


> Produce a focussed document and API for a specific problem

This is a new step that I was hoping to do together with you here.

> produce an implementation

Done, except for the API wrapper. Unless you count my reurl library (but I don't find that API appropriate for a standard, I think).

> let users work through the issues, and build up a test suite as edge-cases are discovered, etc.

Yes, this was the idea, looking for feedback.

* * * 

> There are internal invariants which must be upheld to ensure that, for example, serialising that URL record and parsing it again results in an equivalent URL record.

All taken into consideration. Be careful with the word equivalent here, equal is more fitting.

See this [thread] and this [comment] especially. 

[thread]: https://github.com/alwinb/url-specification/issues/3

[comment]: https://github.com/alwinb/url-specification/issues/3#issuecomment-885593385


> That means, for example, you wouldn't be able to just insert some "." or ".." components in the record's list of path components.

An URI, and an URIReference **can** contain such components. There are subclasses of URI, specifically, path-normalised URIs that cannot contain such components. I am doing the same in my work (which really is not that different from the RFCs). The WHATWG is in trouble is because it threw these distinctions out. 

The (non-whatwg) resolution operator agrees with normalisation a follows.

normalise (strict-resolve (url1,  url2)) == strict-resolve (normalise (url1),  normalise (url2)).

**(!)** This is a property that should be maintained as much as possible, it is very powerful.

> In general, I feel it is good practice in software engineering to design and define as little as you can get away with, to solve the problems you actually have, and only add complexity as it becomes necessary, and only if it is worth the cost. 

Mostly agree, but I'm a bit more nuanced about that. Some things cannot be done without the theory, and searching for a general theory often exposes symmetries that you can use to simplify your code. Also something like, say TypeScript, I think you cannot create in this way.

* * * 

The remaining conversation was about me leaving. I don't think you are saying anything unreasonable about that.

> Throwing a temper tantrum and storming off in a huff … is not okay. It's an attempt to shut down debate.

But I will say this.

I know what I am talking about and have been offering my work for free. I don't have to do that.


-- 
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/531#issuecomment-1037165128

You are receiving this because you are subscribed to this thread.

Message ID: <whatwg/url/issues/531/1037165128@github.com>

Received on Saturday, 12 February 2022 11:37:11 UTC