Re: MHTML/HTTP 1.1 Conflicts from Jacob Palme on 1998-01-27 (ietf-http-wg@w3.org from January to March 1998)

From: Jacob Palme <jpalme@dsv.su.se>
Date: Tue, 27 Jan 1998 19:59:15 +0100
To: Jim Gettys <jg@pa.dec.com>
Cc: Nick Shelness <shelness@lotus.com>, IETF working group on HTML in e-mail <mhtml@segate.sunet.se>, http-wg@cuckoo.hpl.hp.com, Nick_Shelness/SSW/Lotus@lotus.com
Message-Id: <v0311071bb0f3d5bfd2d7@[130.237.150.138]>

At 09.18 -0800 98-01-26, Jim Gettys wrote:
> The reality of the web is that there is tons of content without any line
> length enforced already exisiting...   It just isn't possible to get the
> hundreds of millions of documents reformatted at this date.  Think of
>generic
> HTML on the Web as just another binary data type rather than as text, and
> you'll see how to deal with it.  To forward this HTML through email safely,
> you are certainly have to encode it before transmission. If the bags of
> bits get messed with, the security checksums will clearly fail.

Certainly you cannot ask people to reformat existing documents.
But a company which buys a WYSIWYG HTML editor like Frontpage
or Pagemill might require that they produce "canonical" HTML
which agrees with both e-mail and HTTP rules. This would allow
company employees to, for example, take pages from the local
Intranet and send them by mail outside of this Intranet,
without having digital seals and signatures broken.
>
> What this means, is that there are two possible cases:
> 	1) the HTML to be mailed could be reformatted to email rules, if no
> 	security measures have been applied...
> 	2) the HTML gets encoded and transmitted, and treated as a bag
> 	of bits, if it has any security wrappers.

What kind of encoding are you thinking of? E-mail only has Content-
Transfer-Encoding, and this can in e-mail only be applied to individual
body parts, not to a whole multi-part as one single object to be encoded.
Encoding of a whole multipart as a single large object using
Base64 och Quoted-Printable is explicitly forbidden in e-mail.

I am no expert on security issues and in particular how the security
checksums are computed. If security checksums are not to be broken,
then either the original document must be in canonical form, or
all content-transfer-encoding must be removed at receipt *before*
computing a security checksum. In the second case, the algorithm
for applying content-transfer-encoding (before sending by e-mail)
and removing it again (after receipt by e-mail) must be so exact
that the identical octet sequence is returned at receipt.

(1) Are the algorithms for encoding and deconding with content-
transfer-encoding defined exactly enough to ensure this?

(2) Do the security standards say that content-transfer-encoding
is to be removed before computing security checksums?

------------------------------------------------------------------------
Jacob Palme <jpalme@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme

Received on Wednesday, 28 January 1998 02:24:53 UTC