Re: Digest: use in requests

Hi Julian,

Just adding my 2c as responses in-line:

On Tue, Dec 29, 2020 at 10:28 AM Julian Reschke <julian.reschke@gmx.de>
wrote:

> Hm, that seems like an odd choice for a protocol spec. If the spec
> doesn't say what the Digest means for any request, it's not really
> defining a protocol.
>
> I would *hope* that we can define things so that Digests can
> automatically produced and checked by user agents (browsers) and servers
> (such as a servlet container).
>

FWIW, subresource integrity (SRI) is implemented in browsers. The specifics
are different, the hash applies to the identity encoding, so UAs need to
reverse any content encoding before validation. The fundamentals carry over
so it should be possible but I've not seen any signals that browsers are
interested in automatic Digest validation (yet?).


> > Reading
> https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.6.4.1.p.1
> > ```The purpose of a payload in a request is defined by the method
> semantics```
> > iiuc the receiver, aware of the request semantic, knows its purpose
> > and how to process it, including whether it conveys a partial
> > representation or not.
>
> But "partial repesentation" is a term defined by HTTP; there is (or
> should be) an algorithm that - when inspecting *any* HTTP message -
> tells you whether it's "partial" or not. In HTTP, this is defined by the
> appearance of "Content-Range" for some specific response status codes.
>

> <snip>
>
> It *really* would be good to discuss something *concrete* here.
>
> Let's consider an upload protocol that sends multiple chunks, and then
> lets the server combine these into the final resource.
>
> In that protocol, Digest on each chunk would be use to check the
> integrity of each chunk.
>
> For the final step of creating the final full resource, the client could
> send the expectec digest of the final resource in a *custom* field
> defined for the upload protocol (it would use the same algorithms etc,
> but use a different way to convey it to the server).
>
> With that, generic libraries could at least verify Digests on each of
> the chunks.
>

This is indeed the most likely use case. A very quick survey indicates that
there seem to be some examples of PUT requests with Content-Range in the
wild. I have no experience with these, nor knowledge of how popular they
actually are.

* Amazon S3 Glacier [1]

"This multipart upload operation uploads a part of an archive. You can
upload archive parts in any order because in your Upload Part request you
specify the range of bytes in the assembled archive that will be uploaded
in this part."

Example:
PUT /AccountId/vaults/VaultName/multipart-uploads/uploadID HTTP/1.1
Host: glacier.Region.amazonaws.com
Date: Date
Authorization: SignatureValue
Content-Range: ContentRange
Content-Length: PayloadSize
Content-Type: application/octet-stream
x-amz-sha256-tree-hash: Checksum of the part
x-amz-content-sha256: Checksum of the entire payload
x-amz-glacier-version: 2012-06-01

* Google Drive [2]

"Upload the content in multiple chunks. Use this approach if you need to
reduce the amount of data transferred in any single request. You might need
to reduce data transferred when there is a fixed time limit for individual
requests, as can be the case for certain classes of Google App Engine
requests."

"Add these HTTP headers:
    Content-Length. Set to the number of bytes in the current chunk.
    Content-Range. Set to show which bytes in the file you upload. For
example, Content-Range: bytes 0-524287/2000000 shows that you upload the
first 524,288 bytes (256 x 1024 x 2) in a 2,000,000 byte file."

* Google Cloud Storage [3]

"This page describes how to make a resumable upload request in the Cloud
Storage JSON and XML APIs. This protocol allows you to resume an upload
operation after a communication failure interrupts the flow of data."

Example:
curl -i -X PUT --data-binary @CHUNK_LOCATION \
    -H "Content-Length: CHUNK_SIZE" \
    -H "Content-Range: bytes
CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
    "SESSION_URI"

* draft-wright-http-partial-upload-01 (expired) [4]

"This document specifies a new media type intended for use in PATCH
   payloads that allows a resource to be uploaded in several segments,
   instead of a single large request."

Example:
PATCH /uploads/foo HTTP/1.1
   Content-Type: message/byterange
   Content-Length: 283
   If-Match: "xyzzy"
   If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT

   Content-Range: bytes 100-299/600
   Content-Type: text/plain
   Content-Length: 200

Finally, Dropbox [5] does things a little differently and uses the
Dropbox-API-Arg JSON header field to communicate a cursor containing an
offset of the bytes uploaded so far (which I guess means that parallel
transfers aren't supported).

Example:
curl -X POST https://content.dropboxapi.com/2/files/upload_session/append_v2
\
    --header "Authorization: Bearer"
    --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\":
\"1234faaf0678bcde\",\"offset\": 0},\"close\": false}"
    --header "Content-Type: application/octet-stream"
    --data-binary @local_file.txt

To conclude, I'm not exactly sure how these examples influence the
discussion. It seems that there are actually concrete cases of "partial
requests" but it's unclear to me if these break HTTP semantic rules and/or
if it should be documented for formally. The examples I've seen are for
APIs that also have their own custom means for integrity checks, or still
use Content-MD5. It would be nice if something like Digest covered all
avenues and we could get folks to switch to it, but I've not seen any
signals that such APIs are interested in Digest. Therefore, I'm wary of
Digest taking on too much work to describe something without any
implementer interest. In the interest of progress, if partial requests for
uploads is something people think needs standardising, I think that could
be done as an independent follow-on work item e.g. a document that updates
Digest.

Cheers
Lucas

[1] -
https://docs.aws.amazon.com/amazonglacier/latest/dev/api-upload-part.html
[2] -
https://developers.google.com/drive/api/v3/manage-uploads#http---multiple-requests
[3] - https://cloud.google.com/storage/docs/performing-resumable-uploads
[4] - https://tools.ietf.org/html/draft-wright-http-partial-upload-01
[5] -
https://www.dropbox.com/developers/documentation/http/documentation#files-upload_session-append:2

Received on Tuesday, 29 December 2020 12:17:59 UTC