Re: HTTP Spec: PUT without data transfer, since hash of data is known to server from Ed McClanahan on 2015-10-07 (w3c-dist-auth@w3.org from October to December 2015)

From: Ed McClanahan <edward.d.mcclanahan@gmail.com>
Date: Wed, 7 Oct 2015 07:26:19 -0700
To: Thomas Güttler <guettliml@thomas-guettler.de>
Cc: w3c-dist-auth@w3.org
Message-ID: <CAMSqNL=SgDgq=xUhTYb4xULWuER2_kH0ds-bgUJV_JFk3VmgGw@mail.gmail.com>

Hmm... HTTP PATCH sounds like a problem then. Imagine that a previous PUT
of some other resource included said hash. A later PATCH modifies a portion
of that old resource. In order to be able to reference the new content of
that old resource, a new hash for the entire resource needs to be
recalculated. Not very practical for small PATCHes to large resources...

Still, it seems HTTP PATCH also provides an elegant solution. Using PATCH,
they payload could be a simple "the data for my new resource has this hash"
rather than the data itself. The HTTP server could accept or reject the
PATCH request based upon whether or not it has seen this hash before. If
rejected, the client just does the normal PUT with unique data anyway.

Going further, some sort of rsync like HTTP PATCH payload could be used
where blocks of the resource to be loaded are individually hashed. The
PATCH response could be "OK, I have these blocks but not those". A
subsequent PATCH could upload only those blocks that contain new data.

I would like to add that hashes aren't perfect - most notably MD5. False
positives would seemingly be a problem. Some scheme might be needed to be
able to detect false positives.

Finally, there is definitely a security question. The best example of it
was once described to me this way:

1) I work at a company that archives the form letters containing all job
offers differing only by the employee's name and salary.

2) I want to know John Smith's salary (i.e. I know his name but not his
salary).

3) I compose a series of form letter offers each with John Smith's name but
with varying salaries.

4) I try this dedupe-able PUT/PATCH operation for each such offer letter.

5) My HTTP client reports which one is dedupe-able.

The result of #5 reveals John Smith's salary. Oops!

Just wanted to throw out there my PATCH alternative.
On Oct 7, 2015 12:19 AM, "Thomas Güttler" <guettliml@thomas-guettler.de>
wrote:

> I  have seen a lot of useless uploads when syncing a local file system
> with a remote WebDAV server.
>
> I thought about this and asked on stackoverflow.
>
> My idea is to have a PUT which uses ETAgs or a ETag like way, so that the
> data-transfer can be omitted if the server already knows the hash-sum of
> the data.
>
> I got a really good answer from someone who knows the HTTP-specs much
> better than I do:
>
>
> http://stackoverflow.com/questions/32794863/http-spec-put-without-data-transfer-since-hash-of-data-is-known-to-server
>
> Maybe my goal is too high .... but I don't want to implement this. I want
> a official spec :-)
>
> What do you think?
>
> Could something like this become an official recommendation?
>
> BTW, I can't decide about the **how** to implement this. My knowledge
> of the http spec is too low at the moment.
>
> At this moment I want to ask:
>
>  - Is it possible at all to create a spec for http put which ommits
>    the data, if the server knows the hash-sum (like depulicating file
> systems)?
>
>  - If yes, then what is the next step?
>
> PS: Of course the spec should be optional. The server can support it, but
> don't need to.
>
> Regards,
>   Thomas Güttler
>
> --
> http://www.thomas-guettler.de/
>
>
>

Received on Wednesday, 7 October 2015 18:08:40 UTC