- From: Larry Masinter <masinter@parc.xerox.com>
- Date: Mon, 8 Apr 1996 12:53:39 PDT
- To: paulle@microsoft.com
- Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Paul, I'm sorry, but the wording is still not right, when you say: > Lastly, > the canonical form of text types in HTTP includes several > line break conventions, so conversion of all line breaks > to CR-LF is not required before computing or checking > the digest: any acceptable convention should be left > unaltered for inclusion in the digest. The phrase "canonical form" is a well known technical term. It is used in this context: When you have a large set of items A, and an equivalence relationship among those items E, such that two items a and b are deemed to be equivalent if E(a,b), it is possible to define a 'canonical form' C of items in A such that if C(a) = c, then E(a,c). Given a canonical form C, E(x,y) iff C(x) = C(y). That is, the "canonical form" is a unique form of an object that can be used for equality testing when testing equivalent. In the context of MIME types, we say that there are several forms of a text document, namely: one with CRs for linebreaks, one with CRLF for linebreaks, and one with LF for linebreaks, and we wish these to be deemed to be equivalent. For this reason, MIME designates the form with CRLF to be the canonical form, so that you can determine equivalence of two text streams by converting them to the canonical form. At least in SMTP mail, text types are presumed to be transported in canonical form, and MD5 digests are computed on canonical form. By computing MD5 digests of the canonical form, you are assured that equivalent text forms will have the same digest. Now, we decided that we did not wish HTTP to require transformation of text times into canonical form before transmission, and this is fine. However, subsequently also allowing the message digest to be computed on a non-canonical form means that equivalent text streams will have different message digests. I can live with that decision too, if that's really what people want. (Canonicalizing a text stream while computing the digest doesn't seem like it is computationally onerous, though.) It is, however, totally unacceptable to make some statement that "the canonical form of text types in HTTP includes several line break conventions," because it either represents a misuse of the phrase "canonical form", or else asserts that two text streams that differ only by their line break convention should not be treated equivalently.
Received on Monday, 8 April 1996 13:01:49 UTC