- From: Michael Toomim <toomim@gmail.com>
- Date: Thu, 25 Jul 2024 03:47:47 -0700
- To: Rory Hewitt <rory.hewitt@gmail.com>, Pierre Chapuis <catwell-gmail1@catwell.info>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>, Braid <braid-http@googlegroups.com>
- Message-ID: <7ba7da74-bac0-49af-b8e1-84a9678cb838@gmail.com>
Well, the main reason for the Parents: header is to convey the shape of
the historical time DAG.
The scheme you propose here does not give *quite* the full information
on grandparents. Consider these two time DAGs:
DAG 1 has parents and grandparents:
a1 a2
| |
| |
b1 b2
\ /
c
DAG 2 has the grandparenting flipped:
a1 a2
\ /
/ \
b1 b2
\ /
c
These are different DAGs, but they'd produce the same Parents: header in
your scheme:
Parents: "b1", "b2":1; "a1", "a2":2
So I don't think this scheme is yet an improvement.
If your goal is to have a convenient way to request a span of history,
might I suggest doing a HEAD request on the span:
HEAD /foo
Parents: "a1", "a2"
Version: "c"
The server will respond with an outline of the time DAG:
HTTP/1.1 104 Multiresponse
HTTP/1.1 200 OK
Parents: "a1"
Version: "b1"
HTTP/1.1 200 OK
Parents: "a2"
Version: "b2"
HTTP/1.1 200 OK
Parents: "b1", "b2"
Version: "c"
This lets you access the DAG without downloading any patches or resource
contents.
Cheers!
Michael
On 7/22/24 10:13 PM, Rory Hewitt wrote:
> Hey Pierre,
>
> Actually, I kinda WAS talking about my (mis)understanding that the
> Parents header could contain both parents and grandparents...
>
> That being said, if the Parents header can ONLY contain direct
> parents, that seems like a (possibly significant) limitation.
>
> Would it not be an improvement to allow the header to contain a list
> of ancestors back to whatever level the server feels is appropriate or
> retains information, complete with level information (ancestor level):
>
> Parents: "parent1","parent 2":1;
> "grandparent":2;"greatgrandparent1","greatgrandparent2","greatgrandparent3";3
>
> This indicates two parents (level 1), a single grandparent (level 2)
> and 3 great grandparents (level 3).
>
> This could be compared with a similar Parents header for another
> object to determine where differences may be found, and how far back.
>
> Maybe this is getting too far into the weeds - this was, as I noted,
> based on my misunderstanding, which Pierre obviously understands is a
> possibility.
>
> I guess my primary point is that in finding a balance between brevity
> and flexibility, a design that is able to specify detailed information
> is better, even if that detailed information is often elided or ignored.
>
> With these fairly 'generic' header names like Version and Parents, the
> ability to use them to (in theory) 'build' a history of a file and
> compare with a later, earlier or 'sibling' file send very useful...
>
> But I defer to the smarter minds here - I am a mere tinkerer and may
> well have gotten too deep too early.
>
> Rory
>
>
> On Mon, Jul 22, 2024, 8:04 PM Pierre Chapuis
> <catwell-gmail1@catwell.info> wrote:
>
> Hello Michael,
>
> regarding the "version and parents headers" ordering issue Rory
> mentioned, I don't think he was talking about the case where one
> version descends the other one.
>
> The fact that you say this has very strong implications:
>
> > Any version can be recreated by first merging its parents, and
> then applying the its update onto that merger.
>
> It either means there cannot be conflicts between parents - or in
> other words that conflict resolution is deterministic, commutative
> *and* associative (like CRDTs), or that updates must always
> contain the conflict resolution of their parents like Git.
>
> That last solution also means updates can be rejected by the
> server if its history is incoherent, and comes with its own
> issues. The way Git works is that conflict resolution is always
> performed with human intervention on pull, not on push.
>
> I know Braid has answers to this (Merge Types) and you are trying
> to break up the spec here, but it is not surprising that if you
> have a spec that says "versions can have several parents and you
> can merge them" people are going to wonder how.
>
> --
> Pierre Chapuis
>
> On Tue, Jul 23, 2024, at 00:30, Michael Toomim wrote:
>>
>> Rory, thanks for these excellent thoughts! It's exciting to see
>> other people digging into the versioning problem with us. :)
>>
>> Responses:
>>
>> *== Versioning with ETag ==*
>>
>> You make a good point that ETag headers, like the proposed
>> Version header, are opaque strings that can be formatted to
>> express additional information if we want to. This is true for
>> both ETag and Version:
>>
>> ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>> Version: "Sat, 6 Jul 2024 07:28:00 GMT"
>>
>> ETag: "v1.0.2"
>> Version: "v1.0.2"
>>
>> We propose articulating the structure of these version ids using
>> a Version-Type header. You could, for instance, use
>> "Version-Type: date" for the first example, and "Version-Type:
>> semver" for the second.
>>
>> The main problem with ETag, though, is that it marks *unique
>> content* rather than *unique time*. If you mutate the state of
>> the resource from "foo" to "bar" and then back to "foo", you'll
>> revert to the same ETag, even though this is at a different point
>> in time. This breaks collaborative editing algorithms.
>>
>> Finally, I'll note that your claim that ETags don't have to be
>> sensitive to content-encoding is only true for *weak* ETags.
>> Strong ETags must change whenever the byte sequence of the
>> response body changes. This means they should be sensitive to
>> content-encoding. RFC9110 is also explicit that they depend on
>> content-type:
>>
>> > A strong validator might change for reasons other than a
>> change to the representation data, such as when a
>> semantically significant part of the representation metadata
>> is changed (e.g., Content-Type)
>> https://datatracker.ietf.org/doc/html/rfc9110#section-8.8.1
>>
>> Consider the case where a user edits a markdown resource:
>>
>> PUT /foo
>> Content-Type: text/markdown
>> Version: "mike-99"
>>
>> # This is a markdown file
>>
>> Hello world!
>>
>> And the server then shares this as HTML:
>>
>> GET /foo
>> Accept: application/html
>>
>>
>> HTTP/1.1 200 OK
>> Content-Type: application/html
>> Version: "mike-99"
>>
>> <html>
>> <body>
>> <h1>This is a markdown file</h1>
>> <p>Hello world!</p>
>> </body>
>> </html>
>>
>> Using the Version header, we're able to express that these are
>> two representations of the resource at the same point in time.
>> You can't do this with a strong ETag.
>>
>> *== Version and Parents headers ==*
>>
>> I think there's been a miscommunication here. The reason there
>> are multiple version IDs in the Parents header is for edits that
>> happen *in parallel*, not for edits that happen in sequence. This
>> is to represent a version DAG:
>>
>> a <-- oldest version
>> / \
>> b c
>> \ /
>> d <-- current version
>>
>> In this example, the current version "d" would have:
>>
>> Parents: "b", "c"
>>
>> This is not allowed:
>>
>> Parents: "d", "b"
>>
>> Because of this language in the spec:
>>
>> For any two version IDs A and B that are specified in a
>> Version or
>> Parents header, A cannot be a descendent of B or vice versa. The
>> ordering of version IDs within the header carries no meaning.
>>
>> Good question!
>>
>> *== Client-generated Version IDs on PUT ==*
>>
>> Yes, there would be a problem if two clients generate the same
>> version IDs for two different PUTs. Then the versions would not
>> be unique!
>>
>> However, requiring the server to generate versions is only one
>> possible solution— and is a solution that requires a server. We
>> also want to support distributed p2p systems, which don't have
>> servers.
>>
>> In these systems, it's quite common for clients to generate
>> version IDs. There are two common ways to solve this problem:
>>
>> 1. Use a large random hash space so that collisions are
>> extremely unlikely. This works well enough for git, for instance.
>> 2. Each client gets a unique ID, possibly by coordinating with a
>> server, and then versions are constructed by concatenating
>> "<client-id>:<counter>" for each client.
>>
>> Does this all make sense?
>>
>> Again, good questions, and I am glad to see this interest in the
>> topic! I think we can do a lot with it!
>>
>> Michael
>>
>> On 7/17/24 2:56 PM, Rory Hewitt wrote:
>>> Hey Michael,
>>>
>>> A few thoughts...
>>>
>>> First, I agree that the concept of versioning hasn't been
>>> thought about enough, and this is definitely a 'good idea (TM)'.
>>>
>>> However, I have a few concerns:
>>>
>>> *1.1.2 Versioning with ETag*
>>>
>>> Because ETags are, by definition, unformatted, while it's
>>> true to say that you often can't rely on them to establish a
>>> version, that's entirely dependent on the format chosen by the
>>> user. An ETag *could* validly be specified as a date:
>>>
>>> ETag: "Sat, 6 Jul 2024 07:28:00 GMT"
>>>
>>> or as a version number:
>>>
>>> ETag: "v1.0.2"
>>>
>>> or as a random string:
>>>
>>> ETag: "Michael is cool"
>>>
>>> IOW, it's totally possible for a site that cares about
>>> versioning to use a format that specifies a version number. I
>>> recognize this isn't *necessarily* the case, but it helps to be
>>> clear here. It should be noted that many web servers that
>>> include the creation of ETags natively (e.g. Apache) include an
>>> effective version as part of the ETag.
>>>
>>> Likewise ETags don't *have* to be sensitive to encoding -
>>> there's nothing to stop a server from sending the exact same
>>> ETag for two differently-encoded copies of the same underlying
>>> resource. It's just that they typically do.
>>>
>>> None of this is to say that ETags are better or worse than you
>>> describe - just to say that they *can* be better than they are.
>>>
>>> *2.3 Version and Parents headers*
>>>
>>> You state that the Parents header can include multiple parents
>>> (parents, grandparents, great-grandparents?) and provide an example:
>>>
>>> Parents: "ajtva12kid", "cmdpvkpll2"
>>>
>>> and then say "Any version can be recreated by first merging its
>>> parents, and then applying the its update onto that merger."
>>> (Nit: additional "the" in this sentence). However, you also say
>>> that the order of the values in a Parents header makes no
>>> difference.
>>>
>>> Maybe I'm missing something, but in this scenario, how could
>>> that work? Using your example above, here are two possible
>>> scenarios:
>>>
>>> * Version "ajtva12kid" is earlier. Version "cmdpvkpll2" is later
>>> and contains an additional section of HTML
>>> * Version "ajtva12kid" is earlier and contains a section of HTML
>>> which is removed in the later "cmdpvkpll2" version
>>>
>>> If you merge the two parent versions, then does the outcome
>>> (onto which you will apply the update) include that section of HTML?
>>>
>>> I guess it just makes sense to me to have the order in the
>>> Parents have some meaning - whether oldest first or last. Or you
>>> could specify that both Version and Parent values must be integers.
>>>
>>> 2.4.3 PUT a new version
>>>
>>> This seems like it could lead to either race conditions or some
>>> other issue with duplicate Version values. Surely it's better to
>>> have the client submit a new version of a resource (passing the
>>> Parents header but *not* passing the Version header) and have
>>> the server, which is presumably the prime source of versioning
>>> truth, calculate a version (perhaps after retrieving other PUT
>>> requests from other clients) and return that value in the
>>> Version response header?
>>>
>>> I see you discuss this later with the Current-Version header, so
>>> perhaps you covered this and my old eyes missed it.
>>>
>>> Rory
>>>
>>>
>>> On Mon, Jul 15, 2024 at 6:31 PM Michael Toomim
>>> <toomim@gmail.com> wrote:
>>>
>>> Hi everyone in HTTP!
>>>
>>> Last fall we solicited feedback on the Braid State
>>> Synchronization proposal [draft
>>> <https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-braid-http-04>,
>>> slides
>>> <https://datatracker.ietf.org/meeting/118/materials/slides-118-httpbis-braid-http-add-synchronization-to-http-00>],
>>> which I'd summarize as:
>>>
>>> "We're enthusiastic about the general work, but the
>>> proposal is too high-level. Break the spec up into
>>> multiple independent specs, and work bottom-up. Focus on
>>> concrete 'bits-on-the-wire'."
>>>
>>> So I'm breaking the spec up, and have drafted up the first
>>> chunk for you. I would very much like your review on:
>>>
>>> *Versioning of HTTP Resources*
>>> draft-toomim-httpbis-versions
>>> https://datatracker.ietf.org/doc/html/draft-toomim-httpbis-versions-00
>>>
>>> Versioning is necessary for state synchronization—and occurs
>>> in a range of HTTP systems:
>>>
>>> * Caching
>>> * Archiving
>>> * Version Control
>>> * Collaborative Editing
>>>
>>> Today, HTTP has resource versions in the Last-Modified and
>>> ETag headers, and sometimes embeds versions in URLs, like
>>> with WebDAV. Each of these options serves some needs, but
>>> also has specific limitations. An improved general approach
>>> is proposed, which provides new features, that could enable
>>> cool new applications, such as incrementally-updated RSS
>>> feeds, and could simplify existing specifications, such as
>>> resumeable uploads, and history compression in OT/CRDT
>>> algorithms.
>>>
>>> I would love to know if people find this work interesting. I
>>> think we could improve performance, interoperability, and be
>>> one step closer to having Google Docs power within HTTP URLs.
>>>
>>> Michael
>>>
>>
>> --
>> You received this message because you are subscribed to the
>> Google Groups "Braid" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to braid-http+unsubscribe@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com
>> <https://groups.google.com/d/msgid/braid-http/d713500c-c4db-4bf8-8096-edb0b5ff1751%40gmail.com?utm_medium=email&utm_source=footer>.
>
Received on Thursday, 25 July 2024 10:47:54 UTC