Re: MHTML/HTTP 1.1 Conflicts

At 07.40 -0800 98-01-28, Larry Masinter wrote:
> I know of no Unix text editor that produces text files that
> use the "canonical" format for text/plain using CRLF instead
> of just plain LF, and I've never seen a Macintosh text editor
> that produces text files using the "canonical" format with
> CRLF instead of just plain CR. An editor produces files that
> are appropriate for the native environment, and people with
> Internet mail who mail documents using SMTP expect the mailer
> to convert the file from the local convention to the canonical
> one on transmission, and to do the inverse conversion when
> saving the document.

I believe BBEDIT, a popular Macintosh text editor with special
features for (not WYSIWYG) HTML, has some support for this.
Many HTML editors have built-in features for updating the
master document on a HTTP server. I believe they usually
follow the FTP TEXT convention of converting line breaks to
the convention used on the server platform.

I am no expert on security features. Is it true that these
always convert line breaks to CRLF format before computing
the checksums? If that is the case, the problem becomes
less important. Is it also true that security features
decode Content-Transfer-Encoding before computing their
checksums. That would further reduce the problem of HTML
documents in formats unsuitable for mail transmission without
Content-Transfer-Encoding.

> Security checksums _should_ be applied to the canonical form,
> but in truth, it is feasible to apply it to the transmitted form,
> and it is the choice of the originator whether to do this. Since
> an HTTP server should never actually modify the end-of-line convention,
> the data as represented by the choice of the original sender can
> remain intact. That is, this is not really an issue.

Choice of the originator or choice of the security algorithm?
Do you mean that the security algorithms allows a sealed
document to contain information on which conversions and
uncodings should be done on a document before computing
checksums? Or do you mean that different security schemes
handle this in different ways?

> We've been through this topic endlessly in the past, and wrangled
> out some specific wording in the HTTP draft that was believed generally
> to be adequate to explain the situation, at least as far as
> the 'end of line convention' discussion in the HTTP/1.1 specification
> was concerned.

I am sorry if I am taking time to discuss an issue which you have
already resolved. I agree that HTTP servers should not change the
documents before sending them. This means that the documents stored
in the HTTP server should be in a form which allows gatewaying
to e-mail without loss of security checksums.

> If there was some demand, a separate document on "HTTP/mail gateway
> implementation considerations" could elaborate on the topic, but the
> demand for that seems to be low. (Perhaps IMAP/WebDAV interoperability
> might be an interesting application (isn't a mail folder a 'collection'?)).

In the MHTML group, we have carefully tried to make a standard which
would break security checksums as little as possible. We believed
this was a user requirement; when you say "demand for that seems to
be low", do you mean that this is not, and will not, be an important
user requirement?

We already have som text about this in the MHTML document. Below
is a quote of what we have written:

   -  The MIME standard [MIME2] requires that e-mailed documents of
      "Content-Type: Text/ MUST be in canonical form before a
      Content-Transfer-Encoding is applied, i.e. that line breaks are
      encoded as CRLFs, not as bare CRs or bare LFs or something else.
      This is in contrast to [HTTP] where section 3.6.1 allows other
      representations of line breaks.

   Note that this might cause problems with integrity checks based on
   checksums, which might not be preserved when moving a document from
   the HTTP to the MIME environment. If a document has to be converted
   in such a way that a checksum based message integrity check becomes
   invalid, then this integrity check header SHOULD be removed from the
   document.

   Other sources of problems are Content-Encoding used in HTTP but not
   allowed in MIME, and character sets that are not able to represent
   line breaks as CRLF. A good overview of the differences between
   HTTP and MIME with regards to Content-Type: "text" can be found
   in [HTTP], appendix C.

   If the original document has line breaks in the canonical form
   (CRLF), then the document SHOULD remain unconverted so that
   integrity check sums are not invalidated.

   A provider of HTML documents SHOULD always provide original documents
   in the canonical form with CRLF for line breaks, so that they will be
   transferable via both HTTP and SMTP without invalidating checksum
   integrity checks.

So rather than writing a new document on this, MHTML could be used
or extended to cover this issue. The HTTP spec could just mention
that MHTML contains some discussion on how to create documents which
can be sent through both e-mail and HTTP without breaking security
checksums.

MIME recommends using Content-Transfer-Encoding to break lines longer
than 76 characters when transporting documents through e-mail.
This might also break security checksums, unless the security
checksum algorithms are defined in ways which will not be broken
by Content-Transfer-Encoding.

Is it permitted to use e-mail-type Content-Transfer-Encoding
on documents sent through HTTP? From what you say, I believe it
must be permitted if the Content-Transfer-Encoding header is inside
the body, since HTTP only handles the outermost (HTTP) header.
But is a Content-Transfer-Encoding header field allowed in the
outermost (HTTP) heading?




------------------------------------------------------------------------
Jacob Palme <jpalme@dsv.su.se> (Stockholm University and KTH)
for more info see URL: http://www.dsv.su.se/~jpalme

Received on Thursday, 29 January 1998 04:45:12 UTC