Re: MHTML/HTTP 1.1 Conflicts from Jim Gettys on 1998-01-23 (ietf-http-wg@w3.org from January to March 1998)

From: Jim Gettys <jg@pa.dec.com>
Date: Fri, 23 Jan 1998 14:49:05 -0800
To: Jacob Palme <jpalme@dsv.su.se>
Cc: Nick Shelness <shelness@lotus.com>, IETF working group on HTML in e-mail <mhtml@segate.sunet.se>, http-wg@cuckoo.hpl.hp.com, Nick_Shelness/SSW/Lotus@lotus.com
Message-Id: <9801232249.AA18709@pachyderm.pa.dec.com>
>  From: Jacob Palme <jpalme@dsv.su.se>
>  Date: Fri, 23 Jan 1998 21:12:53 +0100
>  To: jg@pa.dec.com (Jim Gettys), Nick Shelness <shelness@lotus.com>
>  Cc: IETF working group on HTML in e-mail <mhtml@SEGATE.SUNET.SE>,
>          http-wg@cuckoo.hpl.hp.com, Nick_Shelness/SSW/Lotus@lotus.com
>  Subject: Re: MHTML/HTTP 1.1 Conflicts
>  
>  At 08.37 -0800 98-01-23, Jim Gettys wrote:
>  > Content-Location in HTTP is used when resources have multiple
>  > representations (i.e. you can get the same document back in multiple
>  >languages
>  > or datatypes, depending on Accept headers, for example); it isn't clear
>  > that the definitions for Content-Location should match in the two uses.
>  > (e.g. you do a GET on an object, and the Content-Location gives you the
>  > hint about where to find the version that you actually got, possibly for
>  > editing purposes). We can either accept the differing definitions, or you
>  > can change the name in MHTML to confuse the innocent....
>  
>  Do I understand you rightly, that Content-Location for this purpose can
>  only occur on the outermost header of a HTTP message (or possibly
>  it could occur on a subobject of Content-Type: message/http?).
>  

Yes.

HTTP doesn't know beans from subobjects.  HTTP transports a rendered
representation of a single resource.  What is inside the entity (payload) isn't
of interest to HTTP.

HTTP's use of Multipart is very limited, and doesn't violate this principle. 
(e.g. the use of multipart in range requests are referring ranges of the 
same resource as the one requested, not multiple resources).

>  The MHTML usage of Content-Location only occurs inside Content-Type:
>  Multipart/related. Thus, there is in reality no conflict. We could
>  even say that Content-Location can have multiple values inside
>  Multipart/related but that only single values are allowed in HTTP headers!

I believe that is true.

Roy will be working on drafting changes to the HTTP specFrom Albert-Lunde@nwu.edu Sat Jan 24 21:47:26 1998
Received: from hplb.hpl.hp.com by paris.ics.uci.edu id aa03048;
          24 Jan 98 21:47 PST
Received: from otter.hpl.hp.com (otter.hpl.hp.com [15.144.59.2])
	by hplb.hpl.hp.com (8.8.6/8.8.6 HPLabs Relay) with ESMTP id FAA25192;
	Sun, 25 Jan 1998 05:46:37 GMT
Received: from cuckoo.hpl.hp.com by otter.hpl.hp.com with ESMTP
	(1.37.109.16/15.6+ISC) id AA129207192; Sun, 25 Jan 1998 05:46:32 GMT
Received: (from procmail@localhost) by cuckoo.hpl.hp.com (8.7.6/8.7.1) id FAA00922; Sun, 25 Jan 1998 05:46:05 GMT
Resent-Date: Sun, 25 Jan 1998 05:46:05 GMT
X-Sender: lunde@nuinfo.acns.nwu.edu (Unverified)
Message-Id: <v03110700b0f079aee70b@[129.105.110.129]>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Sat, 24 Jan 1998 23:45:02 -0600
To: IETF working group on HTML in e-mail <mhtml@segate.sunet.se>, 
    http-wg@cuckoo.hpl.hp.com
From: Albert Lunde <Albert-Lunde@nwu.edu>
Subject: Re: MHTML/HTTP 1.1 Conflicts 
Resent-Message-Id: <"CdVIX.0.JC.x5joq"@cuckoo>
Resent-From: http-wg@cuckoo.hpl.hp.com
X-Mailing-List: <http-wg@cuckoo.hpl.hp.com> archive/latest/5274
X-Loop: http-wg@cuckoo.hpl.hp.com
Precedence: list
Resent-Sender: http-wg-request@cuckoo.hpl.hp.com

I'm not a great protocol maven, but I'm going to put in my two cents worth...

It seems like the issues you are raising are central to why HTTP is
referred to as "MIME-like" and contrasted with srict MIME in the specs.

>I am reading the HTTP spec now, to check for possible problems with
>MHTML. Since I have not read all of it yet, can you say if there is
>anything at all in the HTTP spec which says anything about the format
>of bodies, i.e. of what comes after the blank line which ends the
>HTTP heading. If the HTTP spec just regards this as an arbitrary string
>of octets, formatted according to its MIME content type, then there
>will probably not be any risk of conflict between MHTML and HTTP.
>
>In particular, does specifications about header line length, header folding,
>end-of-line characters, etc., in the HTTP spec clearly say that these
>specs only apply to lines in the HTTP header! If it does not say so,
>but this is the intention, you need only say this more clearly, and
>all conflicts with MHTML will disappear.

In _most_ respects, I think HTTP regards the body as a stream of bytes...
but a big exception and an important difference from MIME is the treatment
of end-of-line for text/* types.

See RFC2068 sections  3.7.1 and 19.4.1 (which I see you've read..)

In section 3.7.1 it says "This flexibility regarding line breaks applies
only to text media in the entity-body; a bare CR or LF MUST NOT be
substituted for CRLF within any of the HTTP control structures (such as
header fields and multipart boundaries)."

So the HTTP spec says that one of its wacky non-MIME rules applies only to
the entity body.

>It is e-mail, rather than MHTML, which has limitiations. You could
>write something like this, perhaps as added text in chapter 3.7.1
>of the HTTP spec?
>
>   The same content may sometimes be sent through e-mail, sometimes
>   through http. E-mail has different rules than http regarding
>   line length (preferred less than 76 characters in headings,
>   long lines are more often folded, in particular long URLs
>   are sometimes folded by inserting LWS which must be removed
>   before using the URL, line breaks must be CRLF, not bare CR
>   or bare LF). If an object is retrieved through http and then
>   forwarded through e-mail, this may require conversion. Such
>   conversion may invalidate checksums used for digital seals,
>   digitals signatures, etc. This can be avoided if the resource
>   is formatted, also in its http version, according to e-mail
>   rules.
>
>> We can't at this date even contemplate splitting long URL's; it would break
>> huge numbers of implementations.  You need to get in your head that HTTP
>> is a binary, 8 bit clean transport (streaming RPC system) of arbitrary
>> datatypes; it uses MIME like message syntax, but isn't really MIME.
>
>Certainly not in HTTP headings. But what about headings inside multipart
>bodies, transported through HTTP?
>
>> The long line problem really doesn't apply to HTTP at all.
[..]
>Is there no user requirement among http users to be able to retrieve
>resources through http and forward them through e-mail? If there is such
>a user requirement, and if there is another user requirement that
>security checksums should work accross such forwarding, then you do
>have a problem with long lines, even if I can understand that you would
>much prefer that there was no such problem.

I think HTTP makes a distinction between its requirements and those of a
pure MIME environment. Thus these quotes from 19.4.1:

>Where it is possible, a proxy or gateway from HTTP to a strict MIME
>environment SHOULD translate all line breaks within the text media
>types described in section 3.7.1 of this document to the MIME
>canonical form of CRLF. Note, however, that this may be complicated
>by the presence of a Content-Encoding and by the fact that HTTP
>allows the use of some character sets which do not use octets 13 and
>10 to represent CR and LF, as is the case for some multi-byte
>character sets.

and from 19.4.4:

>Proxies and gateways from HTTP to MIME-compliant protocols are
>responsible for ensuring that the message is in the correct format
>and encoding for safe transport on that protocol, where "safe
>transport" is defined by the limitations of the protocol being used.
>Such a proxy or gateway SHOULD label the data with an appropriate
>Content-Transfer-Encoding if doing so will improve the likelihood of
>safe transport over the destination protocol.

My reading of this is that HTTP only imposes its own requirements on the
HTTP headers and body: which are those of an almost-binary transport
(almost because of the CR/LF/CRLF rules), with no line length limits.

Especially the paragraph from 19.4.4 puts the responsibity on HTTP-> mail
(and mail-> HTTP) gateways for unscrewing the real incompatabilties with
MIME.

I'm not sure what the best fix is for some of the issues you raise, but I
don't think you will be able to completely allign HTTP and pure MIME
requirements on message bodies. HTTP is not going to start line wrapping
everything on the off-chance responses (even signed ones) will get
gatewayed to mail somewhere.

Some of the HTTP-> mail gateway problems might be solved by applying a
base64 encoding of the whole thing... but this may not solve everything;
I'm not sure.

Maybe it is desirable to be more explict about what such gateways could do.


---
    Albert Lunde                      Albert-Lunde@nwu.edu
Received on Friday, 23 January 1998 14:50:55 UTC