Re: New Version Notification for draft-toomim-httpbis-versions-00.txt

Rory, thanks for these excellent thoughts! It's exciting to see other 
people digging into the versioning problem with us. :)

Responses:

*== Versioning with ETag ==*

You make a good point that ETag headers, like the proposed Version 
header, are opaque strings that can be formatted to express additional 
information if we want to. This is true for both ETag and Version:

    ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
    Version: "Sat, 6 Jul 2024 07:28:00 GMT"

    ETag: "v1.0.2"
    Version: "v1.0.2"

We propose articulating the structure of these version ids using a 
Version-Type header. You could, for instance, use "Version-Type: date" 
for the first example, and "Version-Type: semver" for the second.

The main problem with ETag, though, is that it marks *unique content* 
rather than *unique time*. If you mutate the state of the resource from 
"foo" to "bar" and then back to "foo", you'll revert to the same ETag, 
even though this is at a different point in time. This breaks 
collaborative editing algorithms.

Finally, I'll note that your claim that ETags don't have to be sensitive 
to content-encoding is only true for *weak* ETags. Strong ETags must 
change whenever the byte sequence of the response body changes. This 
means they should be sensitive to content-encoding. RFC9110 is also 
explicit that they depend on content-type:

     > A strong validator might change for reasons other than a change
    to the representation data, such as when a semantically significant
    part of the representation metadata is changed (e.g., Content-Type)
    https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1

Consider the case where a user edits a markdown resource:

    PUT /foo
    Content-Type: text/markdown
    Version: "mike-99"

    # This is a markdown file

    Hello world!

And the server then shares this as HTML:

    GET /foo
    Accept: application/html


    HTTP/1.1 200 OK
    Content-Type: application/html
    Version: "mike-99"

    <html>
       <body>
         <h1>This is a markdown file</h1>
         <p>Hello world!</p>
       </body>
    </html>

Using the Version header, we're able to express that these are two 
representations of the resource at the same point in time. You can't do 
this with a strong ETag.

*== Version and Parents headers ==*

I think there's been a miscommunication here. The reason there are 
multiple version IDs in the Parents header is for edits that happen *in 
parallel*, not for edits that happen in sequence. This is to represent a 
version DAG:

                   a  <-- oldest version
                  / \
                 b   c
                  \ /
                   d  <-- current version

In this example, the current version "d" would have:

    Parents: "b", "c"

This is not allowed:

    Parents: "d", "b"

Because of this language in the spec:

    For any two version IDs A and B that are specified in a Version or
    Parents header, A cannot be a descendent of B or vice versa. The
    ordering of version IDs within the header carries no meaning.

Good question!

*== Client-generated Version IDs on PUT ==*

Yes, there would be a problem if two clients generate the same version 
IDs for two different PUTs. Then the versions would not be unique!

However, requiring the server to generate versions is only one possible 
solution— and is a solution that requires a server. We also want to 
support distributed p2p systems, which don't have servers.

In these systems, it's quite common for clients to generate version IDs. 
There are two common ways to solve this problem:

 1. Use a large random hash space so that collisions are extremely
    unlikely. This works well enough for git, for instance.
 2. Each client gets a unique ID, possibly by coordinating with a
    server, and then versions are constructed by concatenating
    "<client-id>:<counter>" for each client.

Does this all make sense?

Again, good questions, and I am glad to see this interest in the topic! 
I think we can do a lot with it!

Michael

On 7/17/24 2:56 PM, Rory Hewitt wrote:
> Hey Michael,
>
> A few thoughts...
>
> First, I agree that the concept of versioning hasn't been thought 
> about enough, and this is definitely a 'good idea (TM)'.
>
> However, I have a few concerns:
>
> *1.1.2 Versioning with ETag*
>
> Because ETags are, by definition, unformatted, while it's true to say 
> that you often can't rely on them to establish a version, that's 
> entirely dependent on the format chosen by the user. An ETag *could* 
> validly be specified as a date:
>
>   ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>
> or as a version number:
>
>   ETag: "v1.0.2"
>
> or as a random string:
>
>   ETag: "Michael is cool"
>
> IOW, it's totally possible for a site that cares about versioning to 
> use a format that specifies a version number. I recognize this isn't 
> *necessarily* the case, but it helps to be clear here. It should be 
> noted that many web servers that include the creation of ETags 
> natively (e.g. Apache) include an effective version as part of the ETag.
>
> Likewise ETags don't *have* to be sensitive to encoding - there's 
> nothing to stop a server from sending the exact same ETag for two 
> differently-encoded copies of the same underlying resource. It's just 
> that they typically do.
>
> None of this is to say that ETags are better or worse than you 
> describe - just to say that they *can* be better than they are.
>
> *2.3 Version and Parents headers*
>
> You state that the Parents header can include multiple parents 
> (parents, grandparents, great-grandparents?) and provide an example:
>
>     Parents: "ajtva12kid", "cmdpvkpll2"
>
> and then say "Any version can be recreated by first merging its 
> parents, and then applying the its update onto that merger." (Nit: 
> additional "the" in this sentence). However, you also say that the 
> order of the values in a Parents header makes no difference.
>
> Maybe I'm missing something, but in this scenario, how could that 
> work? Using your example above, here are two possible scenarios:
>
> * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later and 
> contains an additional section of HTML
> * Version "ajtva12kid" is earlier and contains a section of HTML which 
> is removed in the later "cmdpvkpll2" version
>
> If you merge the two parent versions, then does the outcome (onto 
> which you will apply the update) include that section of HTML?
>
> I guess it just makes sense to me to have the order in the Parents 
> have some meaning - whether oldest first or last. Or you could specify 
> that both Version and Parent values must be integers.
>
> 2.4.3 PUT a new version
>
> This seems like it could lead to either race conditions or some other 
> issue with duplicate Version values. Surely it's better to have the 
> client submit a new version of a resource (passing the Parents header 
> but *not* passing the Version header) and have the server, which is 
> presumably the prime source of versioning truth, calculate a version 
> (perhaps after retrieving other PUT requests from other clients) and 
> return that value in the Version response header?
>
> I see you discuss this later with the Current-Version header, so 
> perhaps you covered this and my old eyes missed it.
>
> Rory
>
>
> On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim <toomim@gmail.com> wrote:
>
>     Hi everyone in HTTP!
>
>     Last fall we solicited feedback on the Braid State Synchronization
>     proposal [draft
>     <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>,
>     slides
>     <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>],
>     which I'd summarize as:
>
>         "We're enthusiastic about the general work, but the proposal
>         is too high-level. Break the spec up into multiple independent
>         specs, and work bottom-up. Focus on concrete 'bits-on-the-wire'."
>
>     So I'm breaking the spec up, and have drafted up the first chunk
>     for you. I would very much like your review on:
>
>         *Versioning of HTTP Resources*
>         draft-toomim-httpbis-versions
>         https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
>
>     Versioning is necessary for state synchronization—and occurs in a
>     range of HTTP systems:
>
>       * Caching
>       * Archiving
>       * Version Control
>       * Collaborative Editing
>
>     Today, HTTP has resource versions in the Last-Modified and ETag
>     headers, and sometimes embeds versions in URLs, like with WebDAV.
>     Each of these options serves some needs, but also has specific
>     limitations. An improved general approach is proposed, which
>     provides new features, that could enable cool new applications,
>     such as incrementally-updated RSS feeds, and could simplify
>     existing specifications, such as resumeable uploads, and history
>     compression in OT/CRDT algorithms.
>
>     I would love to know if people find this work interesting. I think
>     we could improve performance, interoperability, and be one step
>     closer to having Google Docs power within HTTP URLs.
>
>     Michael
>

Received on Monday, 22 July 2024 22:30:31 UTC