- From: Thomas Güttler <guettliml@thomas-guettler.de>
- Date: Thu, 8 Oct 2015 19:47:22 +0200
- To: w3c-dist-auth@w3.org
Am 07.10.2015 um 16:26 schrieb Ed McClanahan: > Hmm... HTTP PATCH sounds like a problem then. Imagine that a previous PUT > of some other resource included said hash. A later PATCH modifies a portion > of that old resource. In order to be able to reference the new content of > that old resource, a new hash for the entire resource needs to be > recalculated. Not very practical for small PATCHes to large resources... Yes, a small PATCH to a big resource would result into a re-calculation of the hash sum. This re-calculation would need to scan the whole resource, although only a small part has changed. That's true. But "that's live", I see no problem. At least in my environment PATCH is hardly used. I see mostly this: Whole files get uploaded and downloaded. > Still, it seems HTTP PATCH also provides an elegant solution. Using PATCH, > they payload could be a simple "the data for my new resource has this hash" > rather than the data itself. The HTTP server could accept or reject the > PATCH request based upon whether or not it has seen this hash before. If > rejected, the client just does the normal PUT with unique data anyway. I am not sure if I can follow your thoughts. Do you want to use PATCH to implement uploads without data transfer, or do you want to use "sending data without transfer" for PATCH, too? >From RFC: The PATCH method requests that a set of changes described in the request entity be applied to the resource identified by the Request-URI. AFAIK you can only PATCH existing resources. My idea is to PUT new resources. The same way could be used for PATCH, but I would like to handle this later. > Going further, some sort of rsync like HTTP PATCH payload could be used > where blocks of the resource to be loaded are individually hashed. The > PATCH response could be "OK, I have these blocks but not those". A > subsequent PATCH could upload only those blocks that contain new data. I would like to keep it simple during the first step and focus on whole uploads only. > I would like to add that hashes aren't perfect - most notably MD5. False > positives would seemingly be a problem. Some scheme might be needed to be > able to detect false positives. Yes, I know. Client and server need to agree on a hash method somehow. If both want md5, they should do it. But I would not offer it, if I would write a server. > Finally, there is definitely a security question. The best example of it > was once described to me this way: > > 1) I work at a company that archives the form letters containing all job > offers differing only by the employee's name and salary. > > 2) I want to know John Smith's salary (i.e. I know his name but not his > salary). > > 3) I compose a series of form letter offers each with John Smith's name but > with varying salaries. > > 4) I try this dedupe-able PUT/PATCH operation for each such offer letter. > > 5) My HTTP client reports which one is dedupe-able. > > The result of #5 reveals John Smith's salary. Oops! Yes, that's a security concern. This could be a solution: If the data with the same hash value is from a differen area, then the server should answer with "I have the data for this hash-sum" only if the data was uploaded twice or more. I can't answer next week. I was told this list is wrong, since my topic is about http and not webdav. I will write to the http list in the week of the 19. Oct. I hope to see/read you there. Thank you for reading and your interest in this topic. Regards, Thomas Güttler -- http://www.thomas-guettler.de/
Received on Thursday, 8 October 2015 17:47:50 UTC