Re: [braid] Re: New Version Notification for draft-toomim-httpbis-versions-00.txt from Michael Toomim on 2024-07-25 (ietf-http-wg@w3.org from July to September 2024)

From: Michael Toomim <toomim@gmail.com>
Date: Thu, 25 Jul 2024 03:47:47 -0700
To: Rory Hewitt <rory.hewitt@gmail.com>, Pierre Chapuis <catwell-gmail1@catwell.info>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Braid <braid-http@googlegroups.com>
Message-ID: <7ba7da74-bac0-49af-b8e1-84a9678cb838@gmail.com>
Well, the main reason for the Parents: header is to convey the shape of 
the historical time DAG.

The scheme you propose here does not give *quite* the full information 
on grandparents. Consider these two time DAGs:

    DAG 1 has parents and grandparents:

         a1   a2
          |   |
          |   |
         b1   b2
           \ /
            c

    DAG 2 has the grandparenting flipped:

         a1   a2
           \ /
           / \
         b1   b2
           \ /
            c

These are different DAGs, but they'd produce the same Parents: header in 
your scheme:

    Parents: "b1", "b2":1; "a1", "a2":2

So I don't think this scheme is yet an improvement.

If your goal is to have a convenient way to request a span of history, 
might I suggest doing a HEAD request on the span:

    HEAD /foo
    Parents: "a1", "a2"
    Version: "c"

The server will respond with an outline of the time DAG:

    HTTP/1.1 104 Multiresponse

    HTTP/1.1 200 OK
    Parents: "a1"
    Version: "b1"

    HTTP/1.1 200 OK
    Parents: "a2"
    Version: "b2"

    HTTP/1.1 200 OK
    Parents: "b1", "b2"
    Version: "c"

This lets you access the DAG without downloading any patches or resource 
contents.

Cheers!

Michael

On 7/22/24 10:13 PM, Rory Hewitt wrote:
> Hey Pierre,
>
> Actually, I kinda WAS talking about my (mis)understanding that the 
> Parents header could contain both parents and grandparents...
>
> That being said, if the Parents header can ONLY contain direct 
> parents, that seems like a (possibly significant) limitation.
>
> Would it not be an improvement to allow the header to contain a list 
> of ancestors back to whatever level the server feels is appropriate or 
> retains information, complete with level information (ancestor level):
>
> Parents: "parent1","parent 2":1; 
> "grandparent":2;"greatgrandparent1","greatgrandparent2","greatgrandparent3";3
>
> This indicates two parents (level 1), a single grandparent (level 2) 
> and 3 great grandparents (level 3).
>
> This could be compared with a similar Parents header for another 
> object to determine where differences may be found, and how far back.
>
> Maybe this is getting too far into the weeds - this was, as I noted, 
> based on my misunderstanding, which Pierre obviously understands is a 
> possibility.
>
> I guess my primary point is that in finding a balance between brevity 
> and flexibility, a design that is able to specify detailed information 
> is better, even if that detailed information is often elided or ignored.
>
> With these fairly 'generic' header names like Version and Parents, the 
> ability to use them to (in theory) 'build' a history of a file and 
> compare with a later, earlier or 'sibling' file send very useful...
>
> But I defer to the smarter minds here - I am a mere tinkerer and may 
> well have gotten too deep too early.
>
> Rory
>
>
> On Mon, Jul 22, 2024, 8:04 PM Pierre Chapuis 
> <catwell-gmail1@catwell.info> wrote:
>
>     Hello Michael,
>
>     regarding the "version and parents headers" ordering issue Rory
>     mentioned, I don't think he was talking about the case where one
>     version descends the other one.
>
>     The fact that you say this has very strong implications:
>
>     > Any version can be recreated by first merging its parents, and
>     then applying the its update onto that merger.
>
>     It either means there cannot be conflicts between parents - or in
>     other words that conflict resolution is deterministic, commutative
>     *and* associative (like CRDTs), or that updates must always
>     contain the conflict resolution of their parents like Git.
>
>     That last solution also means updates can be rejected by the
>     server if its history is incoherent, and comes with its own
>     issues. The way Git works is that conflict resolution is always
>     performed with human intervention on pull, not on push.
>
>     I know Braid has answers to this (Merge Types) and you are trying
>     to break up the spec here, but it is not surprising that if you
>     have a spec that says "versions can have several parents and you
>     can merge them" people are going to wonder how.
>
>     -- 
>     Pierre Chapuis
>
>     On Tue, Jul 23, 2024, at 00:30, Michael Toomim wrote:
>>
>>     Rory, thanks for these excellent thoughts! It's exciting to see
>>     other people digging into the versioning problem with us. :)
>>
>>     Responses:
>>
>>     *== Versioning with ETag ==*
>>
>>     You make a good point that ETag headers, like the proposed
>>     Version header, are opaque strings that can be formatted to
>>     express additional information if we want to. This is true for
>>     both ETag and Version:
>>
>>         ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>>         Version: "Sat, 6 Jul 2024 07:28:00 GMT"
>>
>>         ETag: "v1.0.2"
>>         Version: "v1.0.2"
>>
>>     We propose articulating the structure of these version ids using
>>     a Version-Type header. You could, for instance, use
>>     "Version-Type: date" for the first example, and "Version-Type:
>>     semver" for the second.
>>
>>     The main problem with ETag, though, is that it marks *unique
>>     content* rather than *unique time*. If you mutate the state of
>>     the resource from "foo" to "bar" and then back to "foo", you'll
>>     revert to the same ETag, even though this is at a different point
>>     in time. This breaks collaborative editing algorithms.
>>
>>     Finally, I'll note that your claim that ETags don't have to be
>>     sensitive to content-encoding is only true for *weak* ETags.
>>     Strong ETags must change whenever the byte sequence of the
>>     response body changes. This means they should be sensitive to
>>     content-encoding. RFC9110 is also explicit that they depend on
>>     content-type:
>>
>>         > A strong validator might change for reasons other than a
>>         change to the representation data, such as when a
>>         semantically significant part of the representation metadata
>>         is changed (e.g., Content-Type)
>>         https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1
>>
>>     Consider the case where a user edits a markdown resource:
>>
>>         PUT /foo
>>         Content-Type: text/markdown
>>         Version: "mike-99"
>>
>>         # This is a markdown file
>>
>>         Hello world!
>>
>>     And the server then shares this as HTML:
>>
>>         GET /foo
>>         Accept: application/html
>>
>>
>>         HTTP/1.1 200 OK
>>         Content-Type: application/html
>>         Version: "mike-99"
>>
>>         <html>
>>           <body>
>>             <h1>This is a markdown file</h1>
>>             <p>Hello world!</p>
>>           </body>
>>         </html>
>>
>>     Using the Version header, we're able to express that these are
>>     two representations of the resource at the same point in time.
>>     You can't do this with a strong ETag.
>>
>>     *== Version and Parents headers ==*
>>
>>     I think there's been a miscommunication here. The reason there
>>     are multiple version IDs in the Parents header is for edits that
>>     happen *in parallel*, not for edits that happen in sequence. This
>>     is to represent a version DAG:
>>
>>                       a  <-- oldest version
>>                      / \
>>                     b   c
>>                      \ /
>>                       d  <-- current version
>>
>>     In this example, the current version "d" would have:
>>
>>         Parents: "b", "c"
>>
>>     This is not allowed:
>>
>>         Parents: "d", "b"
>>
>>     Because of this language in the spec:
>>
>>         For any two version IDs A and B that are specified in a
>>         Version or
>>         Parents header, A cannot be a descendent of B or vice versa.  The
>>         ordering of version IDs within the header carries no meaning.
>>
>>     Good question!
>>
>>     *== Client-generated Version IDs on PUT ==*
>>
>>     Yes, there would be a problem if two clients generate the same
>>     version IDs for two different PUTs. Then the versions would not
>>     be unique!
>>
>>     However, requiring the server to generate versions is only one
>>     possible solution— and is a solution that requires a server. We
>>     also want to support distributed p2p systems, which don't have
>>     servers.
>>
>>     In these systems, it's quite common for clients to generate
>>     version IDs. There are two common ways to solve this problem:
>>
>>      1. Use a large random hash space so that collisions are
>>         extremely unlikely. This works well enough for git, for instance.
>>      2. Each client gets a unique ID, possibly by coordinating with a
>>         server, and then versions are constructed by concatenating
>>         "<client-id>:<counter>" for each client.
>>
>>     Does this all make sense?
>>
>>     Again, good questions, and I am glad to see this interest in the
>>     topic! I think we can do a lot with it!
>>
>>     Michael
>>
>>     On 7/17/24 2:56 PM, Rory Hewitt wrote:
>>>     Hey Michael,
>>>
>>>     A few thoughts...
>>>
>>>     First, I agree that the concept of versioning hasn't been
>>>     thought about enough, and this is definitely a 'good idea (TM)'.
>>>
>>>     However, I have a few concerns:
>>>
>>>     *1.1.2 Versioning with ETag*
>>>
>>>     Because ETags are, by definition, unformatted, while it's
>>>     true to say that you often can't rely on them to establish a
>>>     version, that's entirely dependent on the format chosen by the
>>>     user. An ETag *could* validly be specified as a date:
>>>
>>>         ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>>>
>>>     or as a version number:
>>>
>>>         ETag: "v1.0.2"
>>>
>>>     or as a random string:
>>>
>>>         ETag: "Michael is cool"
>>>
>>>     IOW, it's totally possible for a site that cares about
>>>     versioning to use a format that specifies a version number. I
>>>     recognize this isn't *necessarily* the case, but it helps to be
>>>     clear here. It should be noted that many web servers that
>>>     include the creation of ETags natively (e.g. Apache) include an
>>>     effective version as part of the ETag.
>>>
>>>     Likewise ETags don't *have* to be sensitive to encoding -
>>>     there's nothing to stop a server from sending the exact same
>>>     ETag for two differently-encoded copies of the same underlying
>>>     resource. It's just that they typically do.
>>>
>>>     None of this is to say that ETags are better or worse than you
>>>     describe - just to say that they *can* be better than they are.
>>>
>>>     *2.3 Version and Parents headers*
>>>
>>>     You state that the Parents header can include multiple parents
>>>     (parents, grandparents, great-grandparents?) and provide an example:
>>>
>>>         Parents: "ajtva12kid", "cmdpvkpll2"
>>>
>>>     and then say "Any version can be recreated by first merging its
>>>     parents, and then applying the its update onto that merger."
>>>     (Nit: additional "the" in this sentence). However, you also say
>>>     that the order of the values in a Parents header makes no
>>>     difference.
>>>
>>>     Maybe I'm missing something, but in this scenario, how could
>>>     that work? Using your example above, here are two possible
>>>     scenarios:
>>>
>>>     * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later
>>>     and contains an additional section of HTML
>>>     * Version "ajtva12kid" is earlier and contains a section of HTML
>>>     which is removed in the later "cmdpvkpll2" version
>>>
>>>     If you merge the two parent versions, then does the outcome
>>>     (onto which you will apply the update) include that section of HTML?
>>>
>>>     I guess it just makes sense to me to have the order in the
>>>     Parents have some meaning - whether oldest first or last. Or you
>>>     could specify that both Version and Parent values must be integers.
>>>
>>>     2.4.3 PUT a new version
>>>
>>>     This seems like it could lead to either race conditions or some
>>>     other issue with duplicate Version values. Surely it's better to
>>>     have the client submit a new version of a resource (passing the
>>>     Parents header but *not* passing the Version header) and have
>>>     the server, which is presumably the prime source of versioning
>>>     truth, calculate a version (perhaps after retrieving other PUT
>>>     requests from other clients) and return that value in the
>>>     Version response header?
>>>
>>>     I see you discuss this later with the Current-Version header, so
>>>     perhaps you covered this and my old eyes missed it.
>>>
>>>     Rory
>>>
>>>
>>>     On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim
>>>     <toomim@gmail.com> wrote:
>>>
>>>         Hi everyone in HTTP!
>>>
>>>         Last fall we solicited feedback on the Braid State
>>>         Synchronization proposal [draft
>>>         <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>,
>>>         slides
>>>         <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>],
>>>         which I'd summarize as:
>>>
>>>             "We're enthusiastic about the general work, but the
>>>             proposal is too high-level. Break the spec up into
>>>             multiple independent specs, and work bottom-up. Focus on
>>>             concrete 'bits-on-the-wire'."
>>>
>>>         So I'm breaking the spec up, and have drafted up the first
>>>         chunk for you. I would very much like your review on:
>>>
>>>             *Versioning of HTTP Resources*
>>>             draft-toomim-httpbis-versions
>>>             https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
>>>
>>>         Versioning is necessary for state synchronization—and occurs
>>>         in a range of HTTP systems:
>>>
>>>           * Caching
>>>           * Archiving
>>>           * Version Control
>>>           * Collaborative Editing
>>>
>>>         Today, HTTP has resource versions in the Last-Modified and
>>>         ETag headers, and sometimes embeds versions in URLs, like
>>>         with WebDAV. Each of these options serves some needs, but
>>>         also has specific limitations. An improved general approach
>>>         is proposed, which provides new features, that could enable
>>>         cool new applications, such as incrementally-updated RSS
>>>         feeds, and could simplify existing specifications, such as
>>>         resumeable uploads, and history compression in OT/CRDT
>>>         algorithms.
>>>
>>>         I would love to know if people find this work interesting. I
>>>         think we could improve performance, interoperability, and be
>>>         one step closer to having Google Docs power within HTTP URLs.
>>>
>>>         Michael
>>>
>>
>>     -- 
>>     You received this message because you are subscribed to the
>>     Google Groups "Braid" group.
>>     To unsubscribe from this group and stop receiving emails from it,
>>     send an email to braid-http+unsubscribe@googlegroups.com.
>>     To view this discussion on the web visit
>>     https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com
>>     <https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com?utm_medium=email&utm_source=footer>.
>
Received on Thursday, 25 July 2024 10:47:54 UTC