Re: [whatwg/url] Addressing HTTP servers over Unix domain sockets (#577) from Cyan Ogilvie on 2021-11-13 (public-webapps-github@w3.org from November 2021)

From: Cyan Ogilvie <notifications@github.com>
Date: Fri, 12 Nov 2021 23:54:28 -0800
To: whatwg/url <url@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <whatwg/url/issues/577/967800685@github.com>
Yes, I think the place for the UDS socket is in the authority portion - that's the bit that has the responsibility for describing the endpoint of the stream socket to talk to for this resource.  Putting it elsewhere feels like an abuse and likely to cause unforeseen problems (HTTP client software will certainly have the host portion of the URL available in the portion of the code that establishes the stream socket, but may not have the fragment).

I think the namespace collision with IPv6 literals and syntax validation for UDS paths can be solved by:
- Reusing the syntax for the path portion of the URI:  "/" is a separator, path elements must be percent encoded.
- Socket paths must be absolute (start with "/" or "~").  This distinguishes them from IPv6 literals, and should be the case anyway (what would a relative path be relative *to*?  No similar relative resolution for hostnames exists in the standard).
- Possibly using a version prefix as envisioned by RFC 3986, putting it within the syntax anticipated in that standard, something like: `http://[v1.uds:/tmp/mysock]/foo/bar`.

It's up to the host to decode and translate the path into whatever native scheme that OS uses (just as it is for the path portion of the URI).

For me the motivation for supporting HTTP over UDS goes way beyond web browsers (and I would see that as a minor use case for this) - for better or worse HTTP has become a lingua franca protocol for anything that wants to communicate on the Internet (consider websockets for some of the forces that drive this), and that is increasingly machine to machine.  For example: we run an online marketplace that serves about 10 million requests a day over HTTP (excluding static resources offloaded to a CDN), but each of those involve several HTTP interactions with other services to construct the response: Elasticsearch queries, S3 to fetch image sources that are resized, etc, a whole host of REST services for shipping estimates, geocoding, ratings and reviews, federated authentication providers etc.  So, by volume, the overwhelming majority of HTTP requests our webservers are party to are between them and other servers, and aren't transporting web pages.

As the trend toward microservices and containerization continues this will only increase, and it's particularly there that I see HTTP-over-UDS being useful:
- Communication over UDS is materially faster and lower latency than over the loopback interface because a lot of the complexities in the network stack can be skipped - packet filtering and transformation, TCP, etc.  The loopback interface doesn't have network latency but it still has all these other things.  Local sockets (UDS) are more or less just buffers managed by the kernel.  This starts to really matter to page response times when generating the page involves many interactions with microservices.
- The namespace for sockets is hierarchical for UDS rather than flat for ports on localhost, so there is a natural way to scope the namespace for each microservice, and which is self-describing.  Compare `http://localhost:1234/` with `http://[/sockets/session/addrs]/` for the address of a microservice providing the address management service for the current session user.

The other trend is for UIs to be implemented in HTML rather than some OS-native widget set (Android, iOS, GTK, QT, MacOS native controls, Windows native controls, etc), even when the application is entirely local on the user's device.  There are very good reasons for this:
- HTML+Javascript is portable, greatly reducing the cost to develop the application if it has to run across platforms.
- HTML+Javascript is much richer and more capable than those native widget sets in the types of UIs they can implement.
- Essentially every developer these days already knows HTML and Javascript.
- Gone are the days when users expect native OS controls.  These days they expect web application style interfaces, since that's the majority of what they're exposed to (gmail, various cloud based office applications, twitter, etc.)

In this use case the hierarchical namespace issue is important and addresses a major downside to this pattern - choosing a port from the flat, system-wide shared namespace (ok, so the listening socket can specify 0 and have the OS pick a random unused port on some systems, but that's a bit ugly).  Much nicer to use `~/.sockets/<app>/<pid>`, and more discoverable.  Another reason to use UDS in this case is that the user for the client side of the socket can be obtained from the OS in a way that only trusts the OS, solving the other issue with this pattern - knowing which user we're interacting with.  If these issues were solved by HTTP-over-UDS, do you think something like Prusaslicer would use that (HTML, Javascript, webGL) rather than wxWidgets for its UI portability requirements?  That would make porting to mobile devices like tablets much easier too.

Finally, consider things like headless Chrome in an automated CI/CD pipeline - the software managing the tests being run on the deployment candidate version could start a number of headless chrome instances and run tests in parallel, easily addressing the websocket each provides with a UDS path like `/tmp/chrome/<pid>` rather than somehow managing port assignments.

The tech already exists to make these obvious next steps in application provisioning and inter-service happen (even Windows supports Local sockets aka UDS), and the scope of the change for existing HTTP client software should be small and of limited scope (URL parsing, name resolution and stream socket establishment steps) but it can't happen unless there is a standardised way to address these sockets.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/whatwg/url/issues/577#issuecomment-967800685
Received on Saturday, 13 November 2021 07:54:41 UTC