- From: Michael Toomim <toomim@gmail.com>
- Date: Tue, 4 Feb 2025 16:53:47 -0800
- To: Julian Reschke <julian.reschke@gmx.de>, ietf-http-wg@w3.org
- Message-ID: <d5a63a30-a63a-4b02-81a1-b6b01f4a4f1b@gmail.com>
Julian, I think I see the root of our disagreement. I think we have been
assuming different mental models. Let me know if I'm getting this right:
1. In your model, each "Resource" has a set of "Version Resources", in a
*one-to-many* mapping, like this:
- https://foo.com/hello.txt <-- Resource
- https://foo.com/hello.txt.1 <-- Version 1 of Resource
- https://foo.com/hello.txt.2 <-- Version 2 of Resource
- https://foo.com/hello.txt.3 <-- Version 3 of Resource
- https://foo.com/hello.txt.4 <-- Version 4 of Resource
2. In my model, there is a *many-to-many* mapping between Resources and
Versions. A Version marks a point in distributed time, and multiple
Resources can be at the same Version:
Resources:
- https://foo.com/hello.txt <-- Resource
- https://foo.com/world.txt <-- Resource
Versions:
alice-0
/ \
/ \
alice-1 bob-0
| |
alice-2 |
\ /
\ /
bob-1
Resource x Versions:
- https://foo.com/hello.txt at {alice-0} <-- Two resources ...
- https://foo.com/world.txt at {alice-0} <-- ... can be at the
same version
- https://foo.com/hello.txt at {bob-0}
- https://foo.com/hello.txt at {bob-1}
- https://foo.com/world.txt at {alice-1,bob-0}
This is a fundamental design choice in Versioning: do Resources map to
Versions in a one-to-many, or many-to-many relation? This choice is
foundational, and has big impacts, so let's understand it clearly.
Option 1:
* Simpler, but less expressive for distributed systems.
* WebDAV chooses this model.
* Lets us assume each Version ID is a URI that we can GET.
Option 2:
* More expressive, enables important use-cases in distributed systems.
* Our proposal (draft-toomim-httpbis-versions) chooses this model.
* Requires a separate spec to define URIs that map into Versioned
Resources.
I am advocating for Option 2. Now I will illustrate the differences
between these options, and show you some use-cases that require Option
2. I hope that this clears up the place we got stuck, and I would love
to hear if this makes sense to you, and what you think about Option 2.
== *Option 1: a One-to-Many Mapping:* ==
WebDAV does Option 1. It defines a "Version ID" as a "Resource at a
point in time". This means that given any Version ID, we can
unambiguously point to the contents of the Resource at that point in
time. So if we add in two additional requirements:
1. Each Version ID must be formatted as a URI
2. The response to a GET on that Version ID URI is the contents of
the Resource at that Version
Then we seem to get a very attractively convenient new feature— /we can
GET a Version/! If you plug the Version ID URL into any existing browser
or web client, you get the contents of the resource at that version! You
can even bookmark versions, using existing bookmark systems!
Cool!
However, this only works because we've conflated a "Version" with a
"Resource at a Version". There are actually three concepts at play here:
1. Version
2. Resource
3. Resource at Version
...and we have conflated (1) and (3) together. What we are calling a
"Version" in Option 1 is actually a "Resource at a Version", not a
Version (aka a "point in time") itself.
By conflating (1) and (3), we effectively assume a 1-1 mapping between
them, which rules out having multiple Resources at a Version, which
eliminates a number of important distributed use-cases. Let's look at
Option 2 now, and the new use-cases it enables.
== *Option 2: a Many-to-Many Mapping:* ==
Rather than defining Versions /as/ Resources, in Option 2 we see them as
a /dimension of/ Resources.
Definition: A HTTP Version specifies a point on the dimension of
Time of a Resource.
This is analogous to the existing concept of a HTTP Range:
Definition: A HTTP Range specifies a region in the dimension of
Space of a Resource.
Putting these together, we see that HTTP Resources now have two
naturally addressable dimensions—Time and Space:
== Time (version) and Space (range) are dimensions of Resources ==
Resources
^
|
|
|
| Time <-- Version: header
| /
| /
| /
| /
| /
+-----------------------> Space <-- Range: header
We can address the Version (time) and Range (space) of a Resource
independently, with headers:
GEThttps://example.com/hello.txt
Range: lines=44-100
Version: 1.4.5
By specifying these as headers, they become an independent dimension
with a many-to-many mapping:
* Two Resources can be at the same Version
* ...just as two Versions can exist for the same Resource
* ...just as two Ranges can exist on the same Resource
* ...just as two Resources can be accessed at the same Range
HTTP does not require Ranges to be URIs, and nor should we (in this
model) require Versions to be URIs. Applications can rather *choose* to
define a URI schema for versioned ranges themselves, if desired, such as:
https://example.com/hello.txt?lines=44-100&version=1.4.5
If such URI schemes are popular enough, they can be standardized. But
this can be done in a separate specification. When a Version is an
independent *dimension of* a Resource, there is no useful way to give a
Version itself a URI.
== *Use-Cases: simple Distributed Systems that need Option 2* ==
Here are two use-cases that cannot be supported in Option 1, because
they describe multiple resources (potentially on multiple computers) as
read and/or written at the same "time":
Example 1: Suppose we want to expose a Git repository over HTTP. In Git,
multiple files are committed together at the same version:
https://company.com/git_repo/README.md
https://company.com/git_repo/package.json
https://company.com/git_repo/main.js
All three of these files are at the same version: "83b6f9c".
There is no way to represent this multi-resource Version in Option 1,
because each Resource is presumed to have its own namespace of time.
Consider: if you tried to construct a unique URI to represent this
Version 83b6f9c, what would the URI be? You could call it
"https://example.com/version/83b6f9c", but what would be the result of
doing a GET on that? There's no sensible response body for a Version
alone, because a Version is not itself a Resource.
Example 2: Consider a distributed bank account transaction, between
Alice's account on https://alice.com and Bob's account on
https://bob.com. Alice and Bob both start with $20, and then we transfer
$10 from Alice to Bob, by debiting /Alice and crediting /Bob at the
*same time:*
PATCHhttps://alice.com/alice
Version: "transaction-1"
Content-Type: application/debitcredit
-10
PATCHhttps://bob.com/bob
Version: "transaction-1"
Content-Type: application/debitcredit
+10
To mark these mutations as occurring at the same time, we gave them the
same Version "transaction-1". If this Version was itself to be a URI,
what should the URL be? Would we call it
https://alice.com/transaction-1, or https://bob.com/transaction-1? What
would the result of GET https://alice.com/transaction-1 be? Would it be
Alice's account balance? Isn't that unfair to Bob? There is no longer a
sensible URL definition for Versions, because they need to distinct from
Resources to support transactions across resources.
Thank you for considering this! I would love to hear if I have reflected
your mental model accurately, and if so, what you think about Option 2.
Thank you!
Michael
Received on Wednesday, 5 February 2025 00:53:54 UTC