W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 1996

Re: INTEGOK: updated wording

From: Ned Freed <NED@innosoft.com>
Date: Tue, 02 Apr 1996 18:10:32 -0800 (PST)
To: Paul Leach <paulle@microsoft.com>
Cc: "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Message-Id: <01I32QYCEMRAA8CV1S@INNOSOFT.COM>
> Several comments have come in that are incorporated into the revised
> wording attached. A synopsis of the reasons for the changes:

> 1. Content-MD5 is not proof against malicious attacks

I hope we all know this, but it never hurts to point it out.

> 2. In both MIME and HTTP, the digest is computed on what would be sent
> (modulo Transfer-Encoding or Content-Transfer-Encoding) AND (in
> practice) the canonical form (modulo Content-Encoding); for text, the
> canonical form differs slightly between the two. See the HTTP 1.0 spec,
> appendix C, for the relationship of MIME media types and HTTP media
> types.

Well put. However, I have something of a problem with referring to this as a
difference between "MIME and HTTP". This is confusing in that the MIME
specification doesn't define Content-MD5, either in general or in some
MIME-specific profile. RFC1864 does, it's the only other specification of
Content-MD5, and it does so specifically for email transports, which doesn't
necessarily cover everything that uses MIME in some way.

Someone looking for the "MIME specific definition of Content-MD5" in MIME
proper isn't going to find it, and even if we do eventually fold the definition
of Content-MD5 into MIME proper it will be done in such a way that the
transports it is defined for are clearly specified.

As such, I think it would be better to refer to this a a difference between
"email usage as specified by RFC1864" and HTTP usage.

> 4. Binary media types can specify their own transmission byte orders, so
> network byte order can't be mandated in the digest.

Good. I just sent a message to the effect that this wording needed to be
changed -- please ignore it.

> The Content-MD5 entity-header field is an MD5 digest of the entity-body,
> as defined in RFC 1864 [xx], for the purpose of providing an end-to-end
> message integrity check (MIC) of the entity-body. (Note: an MIC is good
> for detecting accidental modification of the entity-body in transit, but
> is not proof against malicious attacks.)

> 	ContentMD5	= "Content-MD5" ":" md5-digest
> 	md5-digest	= <base64 of 128 bit MD5 digest as per RFC 1864>

> The Content-MD5 header may be generated by an origin server to function
> as an integrity check of the entity-body. Only origin-servers may
> generate the Content-MD5 header field; proxies and gateways MUST NOT
> generate it, as this would defeat its value as an end-to-end integrity
> check. Any recipient of the entity-body, including gateways and proxies,
> MAY check that the digest value in this header field matches that of the
> entity-body as received.

> The MD5 digest is computed based on the content of the entity body,
> including any Content-Encoding that has been applied, but not including
> any Transfer-Encoding.  If the entity is received with a
> Transfer-Encoding, that encoding must be removed prior to checking the
> Content-MD5 value against the received entity.

> This has the result that the digest is computed on the octets of the
> entity body exactly as, and in the order that, they would be sent if no
> transfer coding were being applied.

>    Note: there are several ways in which the application of
>    Content-MD5 to HTTP entity-bodies differs from its
>    application to MIME entity-bodies. One is that HTTP,
>    unlike MIME, does not use Content-Transfer-Encoding,
>    and does use Transfer-Encoding and Content-Encoding.
>    Another is that, unlike MIME, the digest is computed
>    over the entire entity-body, even if it happens to be
>    a MIME "multipart" content-type. (Note that the multipart
>    bodies may themselves have Content-MD5 headers.) Another
>    is that HTTP more frequently uses binary content types
>    than MIME, so it is worth noting that in such cases,
>    the byte order used to compute the digest is the
>    transmission byte order defined for the type. Lastly,
>    the canonical form of text types in HTTP includes several
>    line break conventions, so conversion of all line breaks
>    to CR-LF is not required before computing or checking
>    the digest: any acceptable convention should be left
>    unaltered for inclusion in the digest.

This is much better but still not quite right. For one thing, it isn't that
RFC1864 defines the way in which Content-MD5 is done over a multipart
differently, it's that Content-MD5 isn't allowed over composite objects by
RFC1864, period. There is no definition for how to do Content-MD5 over either a
multipart or a message/rfc822 for this prose to differ from.

A more important point is that the calculation of Content-MD5 over multipart
(or message/rfc822) is inadequately specified in this document. It seems clear
that any transfer encoding is removed before the checksum is computed. But a
multipart (or message/rfc822) can contain many different transfer encodings.
Are they all removed? (I assume the answer is "yes".) If they are removed what
happens to the header fields specifying the transfer encoding? Are they
included in the MIC calculation, or are they ignored since they have
effectively been removed? (I have no idea which way this should be done.)

Defining Content-MD5 in the case of a composite MIME object is a real can of
worms. There were good reasons why RFC1864 didn't get into it. At a minimum you
have to deal with all sorts of composite objects, not just multipart, and you
also need to specify how transfer encoding fields are handled under the
checksum.

Finally, as the computation of Content-MD5 is done over the content with
transfer encodings removed, the note about binary being more or less prevalent
on different transports is pointless. Once transfer encodings are removed
everything that isn't textual in form is binary by definition. This note makes
it sound like the calculation is different depending on whether or not a
transfer encoding was used. We have substantive proof that implementors get
confused by such things (e.g. the implementors of Microsoft Exchange believing
that transfer encodings are some sort of presentation device for MIME agents)
and as such I strongly recommend that this be reworded.

				Ned
Received on Tuesday, 2 April 1996 18:42:38 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:31:50 EDT