Re: draft-toomim-httpbis-versions HTTP mapping (and WebDAV Versioning)

Julian, I think I see the root of our disagreement. I think we have been 
assuming different mental models. Let me know if I'm getting this right:

1. In your model, each "Resource" has a set of "Version Resources", in a 
*one-to-many* mapping, like this:

    - https://foo.com/hello.txt <-- Resource
       - https://foo.com/hello.txt.1 <-- Version 1 of Resource
       - https://foo.com/hello.txt.2 <-- Version 2 of Resource
       - https://foo.com/hello.txt.3 <-- Version 3 of Resource
       - https://foo.com/hello.txt.4 <-- Version 4 of Resource

2. In my model, there is a *many-to-many* mapping between Resources and 
Versions. A Version marks a point in distributed time, and multiple 
Resources can be at the same Version:

    Resources:
    - https://foo.com/hello.txt <-- Resource
    - https://foo.com/world.txt         <-- Resource

    Versions:
             alice-0
              /  \
             /    \
       alice-1    bob-0
            |      |
       alice-2     |
             \    /
              \  /
              bob-1

    Resource x Versions:
    - https://foo.com/hello.txt at {alice-0}    <-- Two resources ...
    - https://foo.com/world.txt at {alice-0}    <-- ... can be at the
    same version
    - https://foo.com/hello.txt at {bob-0}
    - https://foo.com/hello.txt at {bob-1}
    - https://foo.com/world.txt at {alice-1,bob-0}

This is a fundamental design choice in Versioning: do Resources map to 
Versions in a one-to-many, or many-to-many relation? This choice is 
foundational, and has big impacts, so let's understand it clearly.

Option 1:

  * Simpler, but less expressive for distributed systems.
  * WebDAV chooses this model.
  * Lets us assume each Version ID is a URI that we can GET.

Option 2:

  * More expressive, enables important use-cases in distributed systems.
  * Our proposal (draft-toomim-httpbis-versions) chooses this model.
  * Requires a separate spec to define URIs that map into Versioned
    Resources.

I am advocating for Option 2. Now I will illustrate the differences 
between these options, and show you some use-cases that require Option 
2. I hope that this clears up the place we got stuck, and I would love 
to hear if this makes sense to you, and what you think about Option 2.

== *Option 1: a One-to-Many Mapping:* ==

WebDAV does Option 1. It defines a "Version ID" as a "Resource at a 
point in time". This means that given any Version ID, we can 
unambiguously point to the contents of the Resource at that point in 
time. So if we add in two additional requirements:

    1. Each Version ID must be formatted as a URI
    2. The response to a GET on that Version ID URI is the contents of
    the Resource at that Version

Then we seem to get a very attractively convenient new feature— /we can 
GET a Version/! If you plug the Version ID URL into any existing browser 
or web client, you get the contents of the resource at that version! You 
can even bookmark versions, using existing bookmark systems!

Cool!

However, this only works because we've conflated a "Version" with a 
"Resource at a Version". There are actually three concepts at play here:

 1. Version
 2. Resource
 3. Resource at Version

...and we have conflated (1) and (3) together. What we are calling a 
"Version" in Option 1 is actually a "Resource at a Version", not a 
Version (aka a "point in time") itself.

By conflating (1) and (3), we effectively assume a 1-1 mapping between 
them, which rules out having multiple Resources at a Version, which 
eliminates a number of important distributed use-cases. Let's look at 
Option 2 now, and the new use-cases it enables.

== *Option 2: a Many-to-Many Mapping:* ==

Rather than defining Versions /as/ Resources, in Option 2 we see them as 
a /dimension of/ Resources.

    Definition: A HTTP Version specifies a point on the dimension of
    Time of a Resource.

This is analogous to the existing concept of a HTTP Range:

    Definition: A HTTP Range specifies a region in the dimension of
    Space of a Resource.

Putting these together, we see that HTTP Resources now have two 
naturally addressable dimensions—Time and Space:

       == Time (version) and Space (range) are dimensions of Resources ==

       Resources
       ^
       |
       |
       |
       |           Time    <-- Version: header
       |         /
       |       /
       |     /
       |   /
       | /
       +-----------------------> Space    <-- Range: header

We can address the Version (time) and Range (space) of a Resource 
independently, with headers:

    GEThttps://example.com/hello.txt
    Range: lines=44-100
    Version: 1.4.5

By specifying these as headers, they become an independent dimension 
with a many-to-many mapping:

  * Two Resources can be at the same Version
  * ...just as two Versions can exist for the same Resource
  * ...just as two Ranges can exist on the same Resource
  * ...just as two Resources can be accessed at the same Range

HTTP does not require Ranges to be URIs, and nor should we (in this 
model) require Versions to be URIs. Applications can rather *choose* to 
define a URI schema for versioned ranges themselves, if desired, such as:

    https://example.com/hello.txt?lines=44-100&version=1.4.5

If such URI schemes are popular enough, they can be standardized. But 
this can be done in a separate specification. When a Version is an 
independent *dimension of* a Resource, there is no useful way to give a 
Version itself a URI.

== *Use-Cases: simple Distributed Systems that need Option 2* ==

Here are two use-cases that cannot be supported in Option 1, because 
they describe multiple resources (potentially on multiple computers) as 
read and/or written at the same "time":

Example 1: Suppose we want to expose a Git repository over HTTP. In Git, 
multiple files are committed together at the same version:

    https://company.com/git_repo/README.md
    https://company.com/git_repo/package.json
    https://company.com/git_repo/main.js

All three of these files are at the same version: "83b6f9c".

There is no way to represent this multi-resource Version in Option 1, 
because each Resource is presumed to have its own namespace of time. 
Consider: if you tried to construct a unique URI to represent this 
Version 83b6f9c, what would the URI be? You could call it 
"https://example.com/version/83b6f9c", but what would be the result of 
doing a GET on that? There's no sensible response body for a Version 
alone, because a Version is not itself a Resource.

Example 2: Consider a distributed bank account transaction, between 
Alice's account on https://alice.com and Bob's account on 
https://bob.com. Alice and Bob both start with $20, and then we transfer 
$10 from Alice to Bob, by debiting /Alice and crediting /Bob at the 
*same time:*

    PATCHhttps://alice.com/alice
    Version: "transaction-1"
    Content-Type: application/debitcredit

    -10

    PATCHhttps://bob.com/bob
    Version: "transaction-1"
    Content-Type: application/debitcredit

    +10

To mark these mutations as occurring at the same time, we gave them the 
same Version "transaction-1". If this Version was itself to be a URI, 
what should the URL be? Would we call it 
https://alice.com/transaction-1, or https://bob.com/transaction-1? What 
would the result of GET https://alice.com/transaction-1 be? Would it be 
Alice's account balance? Isn't that unfair to Bob? There is no longer a 
sensible URL definition for Versions, because they need to distinct from 
Resources to support transactions across resources.

Thank you for considering this! I would love to hear if I have reflected 
your mental model accurately, and if so, what you think about Option 2.

Thank you!

Michael

Received on Wednesday, 5 February 2025 00:53:54 UTC