Re: HTTP Instance Digests and encoding

On 16/05/18 22:24, Lucas Pardue wrote:
> Hi,
> 
>  
> 
> We’ve been looking at HTTP Instance Digest as defined in RFC 3230. The
> document seems to skim over a detail that is causing us some internal
> debate when considering compression like gzip. The question boils down
> to whether a digest should be calculated on the pre-compressed object or
> post-compressed one.
> 

Do not confuse Transfer-Encoding and Content-Encoding.

* T-E is only part of the message on-wire framing/encoding.

* C-E is a variant of the origin's resource.


> 
> Section 4.2 states:
> 
>  
> 
> The digest is computed on the entire instance associated with the
> message. The instance is a snapshot of the resource prior to the
> application of any instance manipulation or transfer-coding (see section
> 3). The byte order used to compute the digest is the transmission byte
> order defined for the content-type of the instance.
> 
>  
> 
> Note: the digest is computed before the application of any instance
> manipulation. If a range or a delta-coding [9] is used, the computation
> of the digest after the computation of the range or delta would not
> provide a digest useful for checking the integrity of the reassembled
> instance.
> 
>  
> 
> Section 3 defines the relevant items:
> 

... in terms *additional* to the RFC 2616 (now 723x) definitions for
HTTP terminology.

>  
> 
>    instance          The entity that would be returned in a status-200
> 
>                      response to a GET request, at the current time, for
> 
>                      the selected variant of the specified resource,
> 
>                      with the application of zero or more content-
> 
>                      codings, but without the application of any
> 
>                      instance manipulations or transfer-codings.
> 

ie. C-E:gzip is considered part of the "instance". T-E:gzip is not. This
is consistent with HTTP definitions of the encoding types.

>  
> 
>    instance manipulation
> 
>                      An operation on one or more instances which may
> 
>                      result in an instance being conveyed from server to
> 
>                      client in parts, or in more than one response
> 
>                      message.  For example, a range selection or a delta
> 
>                      encoding.  Instance manipulations are end-to-end,
> 
>                      and often involve the use of a cache at the client.
> 
>  
> 
> In our usage, resources are compresses with gzip and have an
> accompanying Content-encoding: gzip response header. Treating this as an
> instance manipulation is beneficial, because we allows a server to
> precompute the digest independent of the format of the compression used
> to deliver the resource.


Each of your objects has two digests to compute. Just like it has two
deliverable variants for each resource; C-E:gzip and C-E:identity (aka
no C-E).

If you want to use gzip as instance manipulation that makes it
Transfer-Encoding, *not* Content-Encoding.


Amos

Received on Wednesday, 16 May 2018 13:01:04 UTC