- From: Ed McClanahan <edward.d.mcclanahan@gmail.com>
- Date: Wed, 7 Oct 2015 07:26:19 -0700
- To: Thomas Güttler <guettliml@thomas-guettler.de>
- Cc: w3c-dist-auth@w3.org
- Message-ID: <CAMSqNL=SgDgq=xUhTYb4xULWuER2_kH0ds-bgUJV_JFk3VmgGw@mail.gmail.com>
Hmm... HTTP PATCH sounds like a problem then. Imagine that a previous PUT of some other resource included said hash. A later PATCH modifies a portion of that old resource. In order to be able to reference the new content of that old resource, a new hash for the entire resource needs to be recalculated. Not very practical for small PATCHes to large resources... Still, it seems HTTP PATCH also provides an elegant solution. Using PATCH, they payload could be a simple "the data for my new resource has this hash" rather than the data itself. The HTTP server could accept or reject the PATCH request based upon whether or not it has seen this hash before. If rejected, the client just does the normal PUT with unique data anyway. Going further, some sort of rsync like HTTP PATCH payload could be used where blocks of the resource to be loaded are individually hashed. The PATCH response could be "OK, I have these blocks but not those". A subsequent PATCH could upload only those blocks that contain new data. I would like to add that hashes aren't perfect - most notably MD5. False positives would seemingly be a problem. Some scheme might be needed to be able to detect false positives. Finally, there is definitely a security question. The best example of it was once described to me this way: 1) I work at a company that archives the form letters containing all job offers differing only by the employee's name and salary. 2) I want to know John Smith's salary (i.e. I know his name but not his salary). 3) I compose a series of form letter offers each with John Smith's name but with varying salaries. 4) I try this dedupe-able PUT/PATCH operation for each such offer letter. 5) My HTTP client reports which one is dedupe-able. The result of #5 reveals John Smith's salary. Oops! Just wanted to throw out there my PATCH alternative. On Oct 7, 2015 12:19 AM, "Thomas Güttler" <guettliml@thomas-guettler.de> wrote: > I have seen a lot of useless uploads when syncing a local file system > with a remote WebDAV server. > > I thought about this and asked on stackoverflow. > > My idea is to have a PUT which uses ETAgs or a ETag like way, so that the > data-transfer can be omitted if the server already knows the hash-sum of > the data. > > I got a really good answer from someone who knows the HTTP-specs much > better than I do: > > > http://stackoverflow.com/questions/32794863/http-spec-put-without-data-transfer-since-hash-of-data-is-known-to-server > > Maybe my goal is too high .... but I don't want to implement this. I want > a official spec :-) > > What do you think? > > Could something like this become an official recommendation? > > BTW, I can't decide about the **how** to implement this. My knowledge > of the http spec is too low at the moment. > > At this moment I want to ask: > > - Is it possible at all to create a spec for http put which ommits > the data, if the server knows the hash-sum (like depulicating file > systems)? > > - If yes, then what is the next step? > > PS: Of course the spec should be optional. The server can support it, but > don't need to. > > Regards, > Thomas Güttler > > -- > http://www.thomas-guettler.de/ > > >
Received on Wednesday, 7 October 2015 18:08:40 UTC