Re: [braid] Re: New Version Notification for draft-toomim-httpbis-versions-00.txt

Hey Pierre,

Actually, I kinda WAS talking about my (mis)understanding that the Parents
header could contain both parents and grandparents...

That being said, if the Parents header can ONLY contain direct parents,
that seems like a (possibly significant) limitation.

Would it not be an improvement to allow the header to contain a list of
ancestors back to whatever level the server feels is appropriate or retains
information, complete with level information (ancestor level):

Parents: "parent1","parent 2":1;
"grandparent":2;"greatgrandparent1","greatgrandparent2","greatgrandparent3";3

This indicates two parents (level 1), a single grandparent (level 2) and 3
great grandparents (level 3).

This could be compared with a similar Parents header for another object to
determine where differences may be found, and how far back.

Maybe this is getting too far into the weeds - this was, as I noted, based
on my misunderstanding, which Pierre obviously understands is a
possibility.

I guess my primary point is that in finding a balance between brevity and
flexibility, a design that is able to specify detailed information is
better, even if that detailed information is often elided or ignored.

With these fairly 'generic' header names like Version and Parents, the
ability to use them to (in theory) 'build' a history of a file and compare
with a later, earlier or 'sibling' file send very useful...

But I defer to the smarter minds here - I am a mere tinkerer and may well
have gotten too deep too early.

Rory


On Mon, Jul 22, 2024, 8:04 PM Pierre Chapuis <catwell-gmail1@catwell.info>
wrote:

> Hello Michael,
>
> regarding the "version and parents headers" ordering issue Rory mentioned,
> I don't think he was talking about the case where one version descends the
> other one.
>
> The fact that you say this has very strong implications:
>
> > Any version can be recreated by first merging its parents, and then
> applying the its update onto that merger.
>
> It either means there cannot be conflicts between parents - or in other
> words that conflict resolution is deterministic, commutative *and*
> associative (like CRDTs), or that updates must always contain the conflict
> resolution of their parents like Git.
>
> That last solution also means updates can be rejected by the server if its
> history is incoherent, and comes with its own issues. The way Git works is
> that conflict resolution is always performed with human intervention on
> pull, not on push.
>
> I know Braid has answers to this (Merge Types) and you are trying to break
> up the spec here, but it is not surprising that if you have a spec that
> says "versions can have several parents and you can merge them" people are
> going to wonder how.
>
> --
> Pierre Chapuis
>
> On Tue, Jul 23, 2024, at 00:30, Michael Toomim wrote:
>
> Rory, thanks for these excellent thoughts! It's exciting to see other
> people digging into the versioning problem with us. :)
>
> Responses:
>
> *== Versioning with ETag ==*
>
> You make a good point that ETag headers, like the proposed Version header,
> are opaque strings that can be formatted to express additional information
> if we want to. This is true for both ETag and Version:
>
> ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
> Version: "Sat, 6 Jul 2024 07:28:00 GMT"
>
> ETag: "v1.0.2"
> Version: "v1.0.2"
>
> We propose articulating the structure of these version ids using a
> Version-Type header. You could, for instance, use "Version-Type: date" for
> the first example, and "Version-Type: semver" for the second.
>
> The main problem with ETag, though, is that it marks *unique content*
> rather than *unique time*. If you mutate the state of the resource from
> "foo" to "bar" and then back to "foo", you'll revert to the same ETag, even
> though this is at a different point in time. This breaks collaborative
> editing algorithms.
>
> Finally, I'll note that your claim that ETags don't have to be sensitive
> to content-encoding is only true for *weak* ETags. Strong ETags must change
> whenever the byte sequence of the response body changes. This means they
> should be sensitive to content-encoding. RFC9110 is also explicit that they
> depend on content-type:
>
> > A strong validator might change for reasons other than a change to the
> representation data, such as when a semantically significant part of the
> representation metadata is changed (e.g., Content-Type)
> https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1
>
> Consider the case where a user edits a markdown resource:
>
> PUT /foo
> Content-Type: text/markdown
> Version: "mike-99"
>
> # This is a markdown file
>
> Hello world!
>
> And the server then shares this as HTML:
>
> GET /foo
> Accept: application/html
>
>
> HTTP/1.1 200 OK
> Content-Type: application/html
> Version: "mike-99"
>
> <html>
>   <body>
>     <h1>This is a markdown file</h1>
>     <p>Hello world!</p>
>   </body>
> </html>
>
> Using the Version header, we're able to express that these are two
> representations of the resource at the same point in time. You can't do
> this with a strong ETag.
>
> *== Version and Parents headers ==*
>
> I think there's been a miscommunication here. The reason there are
> multiple version IDs in the Parents header is for edits that happen *in
> parallel*, not for edits that happen in sequence. This is to represent a
> version DAG:
>
>                   a  <-- oldest version
>                  / \
>                 b   c
>                  \ /
>                   d  <-- current version
>
> In this example, the current version "d" would have:
>
> Parents: "b", "c"
>
> This is not allowed:
>
> Parents: "d", "b"
>
> Because of this language in the spec:
>
> For any two version IDs A and B that are specified in a Version or
> Parents header, A cannot be a descendent of B or vice versa.  The
> ordering of version IDs within the header carries no meaning.
>
> Good question!
>
> *== Client-generated Version IDs on PUT ==*
>
> Yes, there would be a problem if two clients generate the same version IDs
> for two different PUTs. Then the versions would not be unique!
>
> However, requiring the server to generate versions is only one possible
> solution— and is a solution that requires a server. We also want to support
> distributed p2p systems, which don't have servers.
>
> In these systems, it's quite common for clients to generate version IDs.
> There are two common ways to solve this problem:
>
>    1. Use a large random hash space so that collisions are extremely
>    unlikely. This works well enough for git, for instance.
>    2. Each client gets a unique ID, possibly by coordinating with a
>    server, and then versions are constructed by concatenating
>    "<client-id>:<counter>" for each client.
>
> Does this all make sense?
>
> Again, good questions, and I am glad to see this interest in the topic! I
> think we can do a lot with it!
>
> Michael
> On 7/17/24 2:56 PM, Rory Hewitt wrote:
>
> Hey Michael,
>
> A few thoughts...
>
> First, I agree that the concept of versioning hasn't been thought about
> enough, and this is definitely a 'good idea (TM)'.
>
> However, I have a few concerns:
>
> *1.1.2 Versioning with ETag*
>
> Because ETags are, by definition, unformatted, while it's true to say that
> you often can't rely on them to establish a version, that's entirely
> dependent on the format chosen by the user. An ETag *could* validly be
> specified as a date:
>
>     ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>
> or as a version number:
>
>     ETag: "v1.0.2"
>
> or as a random string:
>
>     ETag: "Michael is cool"
>
> IOW, it's totally possible for a site that cares about versioning to use a
> format that specifies a version number. I recognize this isn't
> *necessarily* the case, but it helps to be clear here. It should be noted
> that many web servers that include the creation of ETags natively (e.g.
> Apache) include an effective version as part of the ETag.
>
> Likewise ETags don't *have* to be sensitive to encoding - there's nothing
> to stop a server from sending the exact same ETag for two
> differently-encoded copies of the same underlying resource. It's just that
> they typically do.
>
> None of this is to say that ETags are better or worse than you describe -
> just to say that they *can* be better than they are.
>
> *2.3 Version and Parents headers*
>
> You state that the Parents header can include multiple parents (parents,
> grandparents, great-grandparents?) and provide an example:
>
>     Parents: "ajtva12kid", "cmdpvkpll2"
>
> and then say "Any version can be recreated by first merging its parents,
> and then applying the its update onto that merger." (Nit: additional "the"
> in this sentence). However, you also say that the order of the values in a
> Parents header makes no difference.
>
> Maybe I'm missing something, but in this scenario, how could that work?
> Using your example above, here are two possible scenarios:
>
> * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later and
> contains an additional section of HTML
> * Version "ajtva12kid" is earlier and contains a section of HTML which is
> removed in the later "cmdpvkpll2" version
>
> If you merge the two parent versions, then does the outcome (onto which
> you will apply the update) include that section of HTML?
>
> I guess it just makes sense to me to have the order in the Parents have
> some meaning - whether oldest first or last. Or you could specify that both
> Version and Parent values must be integers.
>
> 2.4.3 PUT a new version
>
> This seems like it could lead to either race conditions or some other
> issue with duplicate Version values. Surely it's better to have the client
> submit a new version of a resource (passing the Parents header but *not*
> passing the Version header) and have the server, which is presumably the
> prime source of versioning truth, calculate a version (perhaps after
> retrieving other PUT requests from other clients) and return that value in
> the Version response header?
>
> I see you discuss this later with the Current-Version header, so perhaps
> you covered this and my old eyes missed it.
>
> Rory
>
>
> On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim <toomim@gmail.com> wrote:
>
> Hi everyone in HTTP!
>
> Last fall we solicited feedback on the Braid State Synchronization
> proposal [draft
> <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>,
> slides
> <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>],
> which I'd summarize as:
>
> "We're enthusiastic about the general work, but the proposal is too
> high-level. Break the spec up into multiple independent specs, and work
> bottom-up. Focus on concrete 'bits-on-the-wire'."
>
> So I'm breaking the spec up, and have drafted up the first chunk for you.
> I would very much like your review on:
>
> *Versioning of HTTP Resources*
> draft-toomim-httpbis-versions
> https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
>
> Versioning is necessary for state synchronization—and occurs in a range of
> HTTP systems:
>
>    - Caching
>    - Archiving
>    - Version Control
>    - Collaborative Editing
>
> Today, HTTP has resource versions in the Last-Modified and ETag headers,
> and sometimes embeds versions in URLs, like with WebDAV. Each of these
> options serves some needs, but also has specific limitations. An improved
> general approach is proposed, which provides new features, that could
> enable cool new applications, such as incrementally-updated RSS feeds, and
> could simplify existing specifications, such as resumeable uploads, and
> history compression in OT/CRDT algorithms.
>
> I would love to know if people find this work interesting. I think we
> could improve performance, interoperability, and be one step closer to
> having Google Docs power within HTTP URLs.
>
> Michael
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Braid" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to braid-http+unsubscribe@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com
> <https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com?utm_medium=email&utm_source=footer>
> .
>
>

Received on Tuesday, 23 July 2024 05:14:11 UTC