Re: Digest: use in requests from Lucas Pardue on 2020-12-29 (ietf-http-wg@w3.org from October to December 2020)

From: Lucas Pardue <lucaspardue.24.7@gmail.com>
Date: Tue, 29 Dec 2020 12:35:40 +0000
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CALGR9oZJLwM2VZpJmxvAuomvhfc6XkFZwSucHO3_uhSWVy35Gw@mail.gmail.com>
Missed one, would not be surprised if there are more:

* Microsoft OneDrive [1]

"To upload the file, or a portion of the file, your app makes a PUT request
to the *uploadUrl* value received in the *createUploadSession* response.
You can upload the entire file, or split the file into multiple byte
ranges, as long as the maximum bytes in any given request is less than 60
MiB.

The fragments of the file must be uploaded sequentially in order. Uploading
fragments out of order will result in an error."
Example
PUT https://sn3302.up.1drv.com/up/fe6987415ace7X4e1eF866337
Content-Length: 26
Content-Range: bytes 0-25/128

[1] -
https://docs.microsoft.com/en-us/onedrive/developer/rest-api/api/driveitem_createuploadsession?view=odsp-graph-online#example

On Tue, Dec 29, 2020 at 12:17 PM Lucas Pardue <lucaspardue.24.7@gmail.com>
wrote:

> Hi Julian,
>
> Just adding my 2c as responses in-line:
>
> On Tue, Dec 29, 2020 at 10:28 AM Julian Reschke <julian.reschke@gmx.de>
> wrote:
>
>> Hm, that seems like an odd choice for a protocol spec. If the spec
>> doesn't say what the Digest means for any request, it's not really
>> defining a protocol.
>>
>> I would *hope* that we can define things so that Digests can
>> automatically produced and checked by user agents (browsers) and servers
>> (such as a servlet container).
>>
>
> FWIW, subresource integrity (SRI) is implemented in browsers. The
> specifics are different, the hash applies to the identity encoding, so UAs
> need to reverse any content encoding before validation. The fundamentals
> carry over so it should be possible but I've not seen any signals that
> browsers are interested in automatic Digest validation (yet?).
>
>
>> > Reading
>> https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#rfc.section.6.4.1.p.1
>> > ```The purpose of a payload in a request is defined by the method
>> semantics```
>> > iiuc the receiver, aware of the request semantic, knows its purpose
>> > and how to process it, including whether it conveys a partial
>> > representation or not.
>>
>> But "partial repesentation" is a term defined by HTTP; there is (or
>> should be) an algorithm that - when inspecting *any* HTTP message -
>> tells you whether it's "partial" or not. In HTTP, this is defined by the
>> appearance of "Content-Range" for some specific response status codes.
>>
>
>> <snip>
>>
>> It *really* would be good to discuss something *concrete* here.
>>
>> Let's consider an upload protocol that sends multiple chunks, and then
>> lets the server combine these into the final resource.
>>
>> In that protocol, Digest on each chunk would be use to check the
>> integrity of each chunk.
>>
>> For the final step of creating the final full resource, the client could
>> send the expectec digest of the final resource in a *custom* field
>> defined for the upload protocol (it would use the same algorithms etc,
>> but use a different way to convey it to the server).
>>
>> With that, generic libraries could at least verify Digests on each of
>> the chunks.
>>
>
> This is indeed the most likely use case. A very quick survey indicates
> that there seem to be some examples of PUT requests with Content-Range in
> the wild. I have no experience with these, nor knowledge of how popular
> they actually are.
>
> * Amazon S3 Glacier [1]
>
> "This multipart upload operation uploads a part of an archive. You can
> upload archive parts in any order because in your Upload Part request you
> specify the range of bytes in the assembled archive that will be uploaded
> in this part."
>
> Example:
> PUT /AccountId/vaults/VaultName/multipart-uploads/uploadID HTTP/1.1
> Host: glacier.Region.amazonaws.com
> Date: Date
> Authorization: SignatureValue
> Content-Range: ContentRange
> Content-Length: PayloadSize
> Content-Type: application/octet-stream
> x-amz-sha256-tree-hash: Checksum of the part
> x-amz-content-sha256: Checksum of the entire payload
> x-amz-glacier-version: 2012-06-01
>
> * Google Drive [2]
>
> "Upload the content in multiple chunks. Use this approach if you need to
> reduce the amount of data transferred in any single request. You might need
> to reduce data transferred when there is a fixed time limit for individual
> requests, as can be the case for certain classes of Google App Engine
> requests."
>
> "Add these HTTP headers:
>     Content-Length. Set to the number of bytes in the current chunk.
>     Content-Range. Set to show which bytes in the file you upload. For
> example, Content-Range: bytes 0-524287/2000000 shows that you upload the
> first 524,288 bytes (256 x 1024 x 2) in a 2,000,000 byte file."
>
> * Google Cloud Storage [3]
>
> "This page describes how to make a resumable upload request in the Cloud
> Storage JSON and XML APIs. This protocol allows you to resume an upload
> operation after a communication failure interrupts the flow of data."
>
> Example:
> curl -i -X PUT --data-binary @CHUNK_LOCATION \
>     -H "Content-Length: CHUNK_SIZE" \
>     -H "Content-Range: bytes
> CHUNK_FIRST_BYTE-CHUNK_LAST_BYTE/TOTAL_OBJECT_SIZE" \
>     "SESSION_URI"
>
> * draft-wright-http-partial-upload-01 (expired) [4]
>
> "This document specifies a new media type intended for use in PATCH
>    payloads that allows a resource to be uploaded in several segments,
>    instead of a single large request."
>
> Example:
> PATCH /uploads/foo HTTP/1.1
>    Content-Type: message/byterange
>    Content-Length: 283
>    If-Match: "xyzzy"
>    If-Unmodified-Since: Sat, 29 Oct 1994 19:43:31 GMT
>
>    Content-Range: bytes 100-299/600
>    Content-Type: text/plain
>    Content-Length: 200
>
> Finally, Dropbox [5] does things a little differently and uses the
> Dropbox-API-Arg JSON header field to communicate a cursor containing an
> offset of the bytes uploaded so far (which I guess means that parallel
> transfers aren't supported).
>
> Example:
> curl -X POST
> https://content.dropboxapi.com/2/files/upload_session/append_v2 \
>     --header "Authorization: Bearer"
>     --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\":
> \"1234faaf0678bcde\",\"offset\": 0},\"close\": false}"
>     --header "Content-Type: application/octet-stream"
>     --data-binary @local_file.txt
>
> To conclude, I'm not exactly sure how these examples influence the
> discussion. It seems that there are actually concrete cases of "partial
> requests" but it's unclear to me if these break HTTP semantic rules and/or
> if it should be documented for formally. The examples I've seen are for
> APIs that also have their own custom means for integrity checks, or still
> use Content-MD5. It would be nice if something like Digest covered all
> avenues and we could get folks to switch to it, but I've not seen any
> signals that such APIs are interested in Digest. Therefore, I'm wary of
> Digest taking on too much work to describe something without any
> implementer interest. In the interest of progress, if partial requests for
> uploads is something people think needs standardising, I think that could
> be done as an independent follow-on work item e.g. a document that updates
> Digest.
>
> Cheers
> Lucas
>
> [1] -
> https://docs.aws.amazon.com/amazonglacier/latest/dev/api-upload-part.html
> [2] -
> https://developers.google.com/drive/api/v3/manage-uploads#http---multiple-requests
> [3] - https://cloud.google.com/storage/docs/performing-resumable-uploads
> [4] - https://tools.ietf.org/html/draft-wright-http-partial-upload-01
> [5] -
> https://www.dropbox.com/developers/documentation/http/documentation#files-upload_session-append:2
>
>
Received on Tuesday, 29 December 2020 12:36:05 UTC