- From: Michael Toomim <toomim@gmail.com>
- Date: Thu, 25 Jul 2024 03:47:47 -0700
- To: Rory Hewitt <rory.hewitt@gmail.com>, Pierre Chapuis <catwell-gmail1@catwell.info>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Braid <braid-http@googlegroups.com>
- Message-ID: <7ba7da74-bac0-49af-b8e1-84a9678cb838@gmail.com>
Well, the main reason for the Parents: header is to convey the shape of the historical time DAG. The scheme you propose here does not give *quite* the full information on grandparents. Consider these two time DAGs: DAG 1 has parents and grandparents: a1 a2 | | | | b1 b2 \ / c DAG 2 has the grandparenting flipped: a1 a2 \ / / \ b1 b2 \ / c These are different DAGs, but they'd produce the same Parents: header in your scheme: Parents: "b1", "b2":1; "a1", "a2":2 So I don't think this scheme is yet an improvement. If your goal is to have a convenient way to request a span of history, might I suggest doing a HEAD request on the span: HEAD /foo Parents: "a1", "a2" Version: "c" The server will respond with an outline of the time DAG: HTTP/1.1 104 Multiresponse HTTP/1.1 200 OK Parents: "a1" Version: "b1" HTTP/1.1 200 OK Parents: "a2" Version: "b2" HTTP/1.1 200 OK Parents: "b1", "b2" Version: "c" This lets you access the DAG without downloading any patches or resource contents. Cheers! Michael On 7/22/24 10:13 PM, Rory Hewitt wrote: > Hey Pierre, > > Actually, I kinda WAS talking about my (mis)understanding that the > Parents header could contain both parents and grandparents... > > That being said, if the Parents header can ONLY contain direct > parents, that seems like a (possibly significant) limitation. > > Would it not be an improvement to allow the header to contain a list > of ancestors back to whatever level the server feels is appropriate or > retains information, complete with level information (ancestor level): > > Parents: "parent1","parent 2":1; > "grandparent":2;"greatgrandparent1","greatgrandparent2","greatgrandparent3";3 > > This indicates two parents (level 1), a single grandparent (level 2) > and 3 great grandparents (level 3). > > This could be compared with a similar Parents header for another > object to determine where differences may be found, and how far back. > > Maybe this is getting too far into the weeds - this was, as I noted, > based on my misunderstanding, which Pierre obviously understands is a > possibility. > > I guess my primary point is that in finding a balance between brevity > and flexibility, a design that is able to specify detailed information > is better, even if that detailed information is often elided or ignored. > > With these fairly 'generic' header names like Version and Parents, the > ability to use them to (in theory) 'build' a history of a file and > compare with a later, earlier or 'sibling' file send very useful... > > But I defer to the smarter minds here - I am a mere tinkerer and may > well have gotten too deep too early. > > Rory > > > On Mon, Jul 22, 2024, 8:04 PM Pierre Chapuis > <catwell-gmail1@catwell.info> wrote: > > Hello Michael, > > regarding the "version and parents headers" ordering issue Rory > mentioned, I don't think he was talking about the case where one > version descends the other one. > > The fact that you say this has very strong implications: > > > Any version can be recreated by first merging its parents, and > then applying the its update onto that merger. > > It either means there cannot be conflicts between parents - or in > other words that conflict resolution is deterministic, commutative > *and* associative (like CRDTs), or that updates must always > contain the conflict resolution of their parents like Git. > > That last solution also means updates can be rejected by the > server if its history is incoherent, and comes with its own > issues. The way Git works is that conflict resolution is always > performed with human intervention on pull, not on push. > > I know Braid has answers to this (Merge Types) and you are trying > to break up the spec here, but it is not surprising that if you > have a spec that says "versions can have several parents and you > can merge them" people are going to wonder how. > > -- > Pierre Chapuis > > On Tue, Jul 23, 2024, at 00:30, Michael Toomim wrote: >> >> Rory, thanks for these excellent thoughts! It's exciting to see >> other people digging into the versioning problem with us. :) >> >> Responses: >> >> *== Versioning with ETag ==* >> >> You make a good point that ETag headers, like the proposed >> Version header, are opaque strings that can be formatted to >> express additional information if we want to. This is true for >> both ETag and Version: >> >> ETag: "Sat, 6 Jul 2024 07:28:00 GMT" >> Version: "Sat, 6 Jul 2024 07:28:00 GMT" >> >> ETag: "v1.0.2" >> Version: "v1.0.2" >> >> We propose articulating the structure of these version ids using >> a Version-Type header. You could, for instance, use >> "Version-Type: date" for the first example, and "Version-Type: >> semver" for the second. >> >> The main problem with ETag, though, is that it marks *unique >> content* rather than *unique time*. If you mutate the state of >> the resource from "foo" to "bar" and then back to "foo", you'll >> revert to the same ETag, even though this is at a different point >> in time. This breaks collaborative editing algorithms. >> >> Finally, I'll note that your claim that ETags don't have to be >> sensitive to content-encoding is only true for *weak* ETags. >> Strong ETags must change whenever the byte sequence of the >> response body changes. This means they should be sensitive to >> content-encoding. RFC9110 is also explicit that they depend on >> content-type: >> >> > A strong validator might change for reasons other than a >> change to the representation data, such as when a >> semantically significant part of the representation metadata >> is changed (e.g., Content-Type) >> https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1 >> >> Consider the case where a user edits a markdown resource: >> >> PUT /foo >> Content-Type: text/markdown >> Version: "mike-99" >> >> # This is a markdown file >> >> Hello world! >> >> And the server then shares this as HTML: >> >> GET /foo >> Accept: application/html >> >> >> HTTP/1.1 200 OK >> Content-Type: application/html >> Version: "mike-99" >> >> <html> >> <body> >> <h1>This is a markdown file</h1> >> <p>Hello world!</p> >> </body> >> </html> >> >> Using the Version header, we're able to express that these are >> two representations of the resource at the same point in time. >> You can't do this with a strong ETag. >> >> *== Version and Parents headers ==* >> >> I think there's been a miscommunication here. The reason there >> are multiple version IDs in the Parents header is for edits that >> happen *in parallel*, not for edits that happen in sequence. This >> is to represent a version DAG: >> >> a <-- oldest version >> / \ >> b c >> \ / >> d <-- current version >> >> In this example, the current version "d" would have: >> >> Parents: "b", "c" >> >> This is not allowed: >> >> Parents: "d", "b" >> >> Because of this language in the spec: >> >> For any two version IDs A and B that are specified in a >> Version or >> Parents header, A cannot be a descendent of B or vice versa. The >> ordering of version IDs within the header carries no meaning. >> >> Good question! >> >> *== Client-generated Version IDs on PUT ==* >> >> Yes, there would be a problem if two clients generate the same >> version IDs for two different PUTs. Then the versions would not >> be unique! >> >> However, requiring the server to generate versions is only one >> possible solution— and is a solution that requires a server. We >> also want to support distributed p2p systems, which don't have >> servers. >> >> In these systems, it's quite common for clients to generate >> version IDs. There are two common ways to solve this problem: >> >> 1. Use a large random hash space so that collisions are >> extremely unlikely. This works well enough for git, for instance. >> 2. Each client gets a unique ID, possibly by coordinating with a >> server, and then versions are constructed by concatenating >> "<client-id>:<counter>" for each client. >> >> Does this all make sense? >> >> Again, good questions, and I am glad to see this interest in the >> topic! I think we can do a lot with it! >> >> Michael >> >> On 7/17/24 2:56 PM, Rory Hewitt wrote: >>> Hey Michael, >>> >>> A few thoughts... >>> >>> First, I agree that the concept of versioning hasn't been >>> thought about enough, and this is definitely a 'good idea (TM)'. >>> >>> However, I have a few concerns: >>> >>> *1.1.2 Versioning with ETag* >>> >>> Because ETags are, by definition, unformatted, while it's >>> true to say that you often can't rely on them to establish a >>> version, that's entirely dependent on the format chosen by the >>> user. An ETag *could* validly be specified as a date: >>> >>> ETag: "Sat, 6 Jul 2024 07:28:00 GMT" >>> >>> or as a version number: >>> >>> ETag: "v1.0.2" >>> >>> or as a random string: >>> >>> ETag: "Michael is cool" >>> >>> IOW, it's totally possible for a site that cares about >>> versioning to use a format that specifies a version number. I >>> recognize this isn't *necessarily* the case, but it helps to be >>> clear here. It should be noted that many web servers that >>> include the creation of ETags natively (e.g. Apache) include an >>> effective version as part of the ETag. >>> >>> Likewise ETags don't *have* to be sensitive to encoding - >>> there's nothing to stop a server from sending the exact same >>> ETag for two differently-encoded copies of the same underlying >>> resource. It's just that they typically do. >>> >>> None of this is to say that ETags are better or worse than you >>> describe - just to say that they *can* be better than they are. >>> >>> *2.3 Version and Parents headers* >>> >>> You state that the Parents header can include multiple parents >>> (parents, grandparents, great-grandparents?) and provide an example: >>> >>> Parents: "ajtva12kid", "cmdpvkpll2" >>> >>> and then say "Any version can be recreated by first merging its >>> parents, and then applying the its update onto that merger." >>> (Nit: additional "the" in this sentence). However, you also say >>> that the order of the values in a Parents header makes no >>> difference. >>> >>> Maybe I'm missing something, but in this scenario, how could >>> that work? Using your example above, here are two possible >>> scenarios: >>> >>> * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later >>> and contains an additional section of HTML >>> * Version "ajtva12kid" is earlier and contains a section of HTML >>> which is removed in the later "cmdpvkpll2" version >>> >>> If you merge the two parent versions, then does the outcome >>> (onto which you will apply the update) include that section of HTML? >>> >>> I guess it just makes sense to me to have the order in the >>> Parents have some meaning - whether oldest first or last. Or you >>> could specify that both Version and Parent values must be integers. >>> >>> 2.4.3 PUT a new version >>> >>> This seems like it could lead to either race conditions or some >>> other issue with duplicate Version values. Surely it's better to >>> have the client submit a new version of a resource (passing the >>> Parents header but *not* passing the Version header) and have >>> the server, which is presumably the prime source of versioning >>> truth, calculate a version (perhaps after retrieving other PUT >>> requests from other clients) and return that value in the >>> Version response header? >>> >>> I see you discuss this later with the Current-Version header, so >>> perhaps you covered this and my old eyes missed it. >>> >>> Rory >>> >>> >>> On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim >>> <toomim@gmail.com> wrote: >>> >>> Hi everyone in HTTP! >>> >>> Last fall we solicited feedback on the Braid State >>> Synchronization proposal [draft >>> <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>, >>> slides >>> <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>], >>> which I'd summarize as: >>> >>> "We're enthusiastic about the general work, but the >>> proposal is too high-level. Break the spec up into >>> multiple independent specs, and work bottom-up. Focus on >>> concrete 'bits-on-the-wire'." >>> >>> So I'm breaking the spec up, and have drafted up the first >>> chunk for you. I would very much like your review on: >>> >>> *Versioning of HTTP Resources* >>> draft-toomim-httpbis-versions >>> https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00 >>> >>> Versioning is necessary for state synchronization—and occurs >>> in a range of HTTP systems: >>> >>> * Caching >>> * Archiving >>> * Version Control >>> * Collaborative Editing >>> >>> Today, HTTP has resource versions in the Last-Modified and >>> ETag headers, and sometimes embeds versions in URLs, like >>> with WebDAV. Each of these options serves some needs, but >>> also has specific limitations. An improved general approach >>> is proposed, which provides new features, that could enable >>> cool new applications, such as incrementally-updated RSS >>> feeds, and could simplify existing specifications, such as >>> resumeable uploads, and history compression in OT/CRDT >>> algorithms. >>> >>> I would love to know if people find this work interesting. I >>> think we could improve performance, interoperability, and be >>> one step closer to having Google Docs power within HTTP URLs. >>> >>> Michael >>> >> >> -- >> You received this message because you are subscribed to the >> Google Groups "Braid" group. >> To unsubscribe from this group and stop receiving emails from it, >> send an email to braid-http+unsubscribe@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com >> <https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com?utm_medium=email&utm_source=footer>. >
Received on Thursday, 25 July 2024 10:47:54 UTC