- From: Michael Toomim <toomim@gmail.com>
- Date: Tue, 4 Feb 2025 16:53:47 -0800
- To: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg@w3.org
- Message-ID: <d5a63a30-a63a-4b02-81a1-b6b01f4a4f1b@gmail.com>
Julian, I think I see the root of our disagreement. I think we have been assuming different mental models. Let me know if I'm getting this right: 1. In your model, each "Resource" has a set of "Version Resources", in a *one-to-many* mapping, like this: - https://foo.com/hello.txt <-- Resource - https://foo.com/hello.txt.1 <-- Version 1 of Resource - https://foo.com/hello.txt.2 <-- Version 2 of Resource - https://foo.com/hello.txt.3 <-- Version 3 of Resource - https://foo.com/hello.txt.4 <-- Version 4 of Resource 2. In my model, there is a *many-to-many* mapping between Resources and Versions. A Version marks a point in distributed time, and multiple Resources can be at the same Version: Resources: - https://foo.com/hello.txt <-- Resource - https://foo.com/world.txt <-- Resource Versions: alice-0 / \ / \ alice-1 bob-0 | | alice-2 | \ / \ / bob-1 Resource x Versions: - https://foo.com/hello.txt at {alice-0} <-- Two resources ... - https://foo.com/world.txt at {alice-0} <-- ... can be at the same version - https://foo.com/hello.txt at {bob-0} - https://foo.com/hello.txt at {bob-1} - https://foo.com/world.txt at {alice-1,bob-0} This is a fundamental design choice in Versioning: do Resources map to Versions in a one-to-many, or many-to-many relation? This choice is foundational, and has big impacts, so let's understand it clearly. Option 1: * Simpler, but less expressive for distributed systems. * WebDAV chooses this model. * Lets us assume each Version ID is a URI that we can GET. Option 2: * More expressive, enables important use-cases in distributed systems. * Our proposal (draft-toomim-httpbis-versions) chooses this model. * Requires a separate spec to define URIs that map into Versioned Resources. I am advocating for Option 2. Now I will illustrate the differences between these options, and show you some use-cases that require Option 2. I hope that this clears up the place we got stuck, and I would love to hear if this makes sense to you, and what you think about Option 2. == *Option 1: a One-to-Many Mapping:* == WebDAV does Option 1. It defines a "Version ID" as a "Resource at a point in time". This means that given any Version ID, we can unambiguously point to the contents of the Resource at that point in time. So if we add in two additional requirements: 1. Each Version ID must be formatted as a URI 2. The response to a GET on that Version ID URI is the contents of the Resource at that Version Then we seem to get a very attractively convenient new feature— /we can GET a Version/! If you plug the Version ID URL into any existing browser or web client, you get the contents of the resource at that version! You can even bookmark versions, using existing bookmark systems! Cool! However, this only works because we've conflated a "Version" with a "Resource at a Version". There are actually three concepts at play here: 1. Version 2. Resource 3. Resource at Version ...and we have conflated (1) and (3) together. What we are calling a "Version" in Option 1 is actually a "Resource at a Version", not a Version (aka a "point in time") itself. By conflating (1) and (3), we effectively assume a 1-1 mapping between them, which rules out having multiple Resources at a Version, which eliminates a number of important distributed use-cases. Let's look at Option 2 now, and the new use-cases it enables. == *Option 2: a Many-to-Many Mapping:* == Rather than defining Versions /as/ Resources, in Option 2 we see them as a /dimension of/ Resources. Definition: A HTTP Version specifies a point on the dimension of Time of a Resource. This is analogous to the existing concept of a HTTP Range: Definition: A HTTP Range specifies a region in the dimension of Space of a Resource. Putting these together, we see that HTTP Resources now have two naturally addressable dimensions—Time and Space: == Time (version) and Space (range) are dimensions of Resources == Resources ^ | | | | Time <-- Version: header | / | / | / | / | / +-----------------------> Space <-- Range: header We can address the Version (time) and Range (space) of a Resource independently, with headers: GEThttps://example.com/hello.txt Range: lines=44-100 Version: 1.4.5 By specifying these as headers, they become an independent dimension with a many-to-many mapping: * Two Resources can be at the same Version * ...just as two Versions can exist for the same Resource * ...just as two Ranges can exist on the same Resource * ...just as two Resources can be accessed at the same Range HTTP does not require Ranges to be URIs, and nor should we (in this model) require Versions to be URIs. Applications can rather *choose* to define a URI schema for versioned ranges themselves, if desired, such as: https://example.com/hello.txt?lines=44-100&version=1.4.5 If such URI schemes are popular enough, they can be standardized. But this can be done in a separate specification. When a Version is an independent *dimension of* a Resource, there is no useful way to give a Version itself a URI. == *Use-Cases: simple Distributed Systems that need Option 2* == Here are two use-cases that cannot be supported in Option 1, because they describe multiple resources (potentially on multiple computers) as read and/or written at the same "time": Example 1: Suppose we want to expose a Git repository over HTTP. In Git, multiple files are committed together at the same version: https://company.com/git_repo/README.md https://company.com/git_repo/package.json https://company.com/git_repo/main.js All three of these files are at the same version: "83b6f9c". There is no way to represent this multi-resource Version in Option 1, because each Resource is presumed to have its own namespace of time. Consider: if you tried to construct a unique URI to represent this Version 83b6f9c, what would the URI be? You could call it "https://example.com/version/83b6f9c", but what would be the result of doing a GET on that? There's no sensible response body for a Version alone, because a Version is not itself a Resource. Example 2: Consider a distributed bank account transaction, between Alice's account on https://alice.com and Bob's account on https://bob.com. Alice and Bob both start with $20, and then we transfer $10 from Alice to Bob, by debiting /Alice and crediting /Bob at the *same time:* PATCHhttps://alice.com/alice Version: "transaction-1" Content-Type: application/debitcredit -10 PATCHhttps://bob.com/bob Version: "transaction-1" Content-Type: application/debitcredit +10 To mark these mutations as occurring at the same time, we gave them the same Version "transaction-1". If this Version was itself to be a URI, what should the URL be? Would we call it https://alice.com/transaction-1, or https://bob.com/transaction-1? What would the result of GET https://alice.com/transaction-1 be? Would it be Alice's account balance? Isn't that unfair to Bob? There is no longer a sensible URL definition for Versions, because they need to distinct from Resources to support transactions across resources. Thank you for considering this! I would love to hear if I have reflected your mental model accurately, and if so, what you think about Option 2. Thank you! Michael
Received on Wednesday, 5 February 2025 00:53:54 UTC