HTTP: T-T-T-Talking about MIME Generation from Marc Salomon on 1994-12-19 (ietf-http-wg@w3.org from October to December 1994)

From: Marc Salomon <marc@library.ucsf.edu>
Date: Mon, 19 Dec 1994 10:12:52 -0800
To: fielding@ics.uci.edu
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <199412191812.AA23371@library.ucsf.edu >
I take a weekend off to enjoy the nice weather and I miss an e-mail storm.

|From: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>

|Marc Salomon writes:
|
|> 3. Operating under the assumption that at this time in the life cycle of the
|> web, most HTML documents contain components that reside on the same server as
|> the document itself, why not trade the multiple network connections from the
|> client to the same server for some up-front packaging by the server.
|
|Because it defeats caching.

I have sketched a way that caching can be simply implemented and optimized using
multipart.  If a client has previously rendered a document, it can use this 
scheme to issue an (MGET, SESSION, at this level, what its called or the the 
proxy-related concerns are irrelevant to me--I trust people with greater 
expertise on this than I will Do The Right Thing--multipart should sit on top 
of that) to optimize subsequent reloads of that document or other documents 
that share its inclusions.  

Indeed, the restrictions on the use of Message-ID: in HTTP/1.0 defeat caching
WRT MIME.

|Besides, some people may want to use MIME multipart types on HTTP for other
|reasons -- why should the server treat them any differently than other content
|types?  [just server here -- I know why clients would want to do so]

I would appreciate any specific examples of how what I propose poisons the
well for anyone else as well as specific suggestions as to how this can be 
done correctly.

Perhaps some of the work being done at EIT on S-HTTP, which used application/
http should be considered as well.

|> 6. Interoperating efficiently with client cache management brings up some
|> interesting issues.  The ability to check the HTML document's requirements
|> against the client-side cache before issuing a precisely tailored HTTP MGET
|> request (which would be returned as multipart/mixed*).
|
|It's much easier to just keep the connection open (with possibly a very
|short timeout as defined by the server) and respond with multiple responses
|(each one of which may be a multipart entity).

But TCP is much less efficient if you are sending bunch of small objects--you
never get out of first gear as it were.  Wasn't it argued that TCP doesn't get 
efficient till it gets up to speed on a data transfer?  Would having to satisfy 
a set of time-spaced serial requests allow the connection to ever get up to 
speed?

|> 7. An instance of the proposed MIME encoding scheme for HTTP follows.  This
|> is currently in the process of a feasibility study in METHADONE (Mime
|> Encoding
|> THreAded Daemon Optimized for Network Efficiency), a caching, lightweight
|> threaded, MIME multipart encoding HTTP server for solaris 2.x (exploiting the
|> rich base functionality of NCSA's httpd) currently under beta development at
|> the UCSF Library and Center for Knowedge Management.
|
|Sounds like fun, but that's not the hard part.  Serving up MIME multipart
|is not the problem.  What you need to do is get clients to understand
|(and parse) MIME multiparts.  And that means ALL clients.

If this job weren't fun...

This is a test, this is only a test.  If it works, I'll come back with some
numbers.  If it doesn't, I'll come back with some numbers.  If you connect to
my server and don't specify Accept: multipart/mixed, there will be no problem--
for now, (accept for the delay time) or at least until there is some 
standardized implementation of multipart in HTTP/1.1 to which I will, of 
course, adhere.

Indeed, people browsing the 500 gig of online medical journals in the interim 
might not have to wait 4 minutes for a screen of 70 20K journal cover images 
to load.  If it takes more than 3 minutes for this page to load on my SS10 
running both client and server, imagine how long it will take on an xterm 
across town at SF General, connecting to our SC2K along with 500 other 
impatient physicians.

The system as it exists is so inefficient as to be broken in certain cases
that creep up again and again in our application.  I want to be sure that my 
attempt to fix it for my user community be consistent with specification, 
existing and proposed and compatible with other information systems.  Any of 
us implement a quick fix for this in an afternoon, but its not worth my time 
if I am going to contravene specifications.

|> Message-ID: <http://host.domain/http_mime.html>
|
|This is an invalid use of Message-ID -- it must be unique.
|
|> Content-ID: <http://host.domain/path/http_mime.html>
|
|This is an invalid use of Content-ID -- it must be unique.

Stupid me.  I had assumed that the draft HTTP spec-00 talked about current 
practice, but it deals with protocol-as-documented but unimplemented as well.

The specifications in HTTP/1.0 for both the format and use of the Content-ID: 
are more restrictive than that of RFC 1521.  The MIME spec says that the
primary purpose of this field is to assist caching and offers no format
template, while HTTP/1.0 dedicates this field to a transaction identifier and 
requires a strict format template.

I could make an argument that a URL uniquely identifies the content of a
body-part in that a URL cannot identify but one object at one time.  The main
reason for the MIME optional Content-ID field is to allow for caching, which is
facilitated by the use of a URL (in conjunction with the Date: header) in this 
field.  The URL would be valid and cachable from Date: till Expires: and is an
excellent candidate for a RFC 1521 style Content-ID.  A contradictory use for 
the Content-ID field is specified in HTTP/1.0-draft-00, although not currently 
used in any implementation.

The Content-Description header field as specified in RFC 1521 would probably
be an appropriate place for this, although I do not see a Message-Description 
field there.  One would be tempted to look to the URI header to indicate the 
URI associated with a body part, but from what I can tell, it is only used to 
redirect a request and its use here seems inappropriate.

|And what do you do about Content-Encoding?  It's not valid MIME at all.

and later in the weekend Roy writes:

|Until the MIME people add Content-Encoding to the official RFCs, there is no 
|point in even discussing the issue here.  All that we can do is show how they 
|differ and possibly how an HTTP -> SMTP/NNTP gateway should behave.

HTTP currently uses:

Content-Type: application/postscript
Content-Encoding: x-gzip

To express a two-level encoding.  Two level encodings are not included in
MIME because of concerns about machines that are unable to easily perform
the (de)compression.  Since only machines that are able to (de)compress (should)
present this information during negotiation, it does not seem problem for the
web.  In short, current practice of HTTP conflicts with the limitations of 
MIME, so we could validate the out that the MH people took:

<comp.mail.mime-FAQ>
Miscellaneous parameters:

x-conversions=compress  MH 6.8.x viamail: used with application/octet-stream;
                          type=tar.  See also tar(1) and compress(1).
</comp.mail.mime-FAQ>

The solution for a multipart MIME compliant expression of Content-Encoding 
would be to use something like one the following:

Content-Type:  application/postscript
Content-Transfer-Encoding: binary; x-conversions=gzip

|> Two consecutive boundary strings indicate an EOF.
|
|What?  See RFC 1521.  Or, for that matter, read the HTTP/1.0 draft -- it
|already defines EObody-part and EOentity for Multipart entities.

It had a hard time jumping out at me while preparing this document, but you are
again correct.  EOF is indicated by --boundary--.

% grep -i EObody-part draft-fielding-http-spec-00.txt rfc1521.txt
% grep -i EOentity draft-fielding-http-spec-00.txt rfc1521.txt

What are EObody-part and EOentity?

Since the draft HTTP 1.0 spec-00 is supposed to document current practice in
the same manner as the HTML 2.0 effort, why is multipart/mixed documented,
which is most certainly *not* current practice in any implementation I know of
(is it?) although it is described in BASIC HTTP?

|> 8.  I plan to bring this up on the sgml-internet list as well, for a broader,
|> more general perspective.
|
|They already want to ship SGML as a multipart/mixed -- adding a bunch of
|HTTP-specific controls to it just mucks-up things for the mail people.

Taking a case study first might yield some valuable lessons for the general
cases.

-marc
Received on Monday, 19 December 1994 10:17:24 UTC