Re: Gopher+ Considered Harmful

Guido.van.Rossum@cwi.nl
Fri, 11 Dec 1992 11:47:53 +0100


Message-Id: <9212111047.AA29250.guido@voorn.cwi.nl>
To: Dan Connolly <connolly@pixel.convex.com>
Cc: www-talk@nxoc01.cern.ch
Subject: Re: Gopher+ Considered Harmful 
In-Reply-To: Your message of "Thu, 10 Dec 1992 12:05:02 MET."
             <9212101805.AA05022@pixel.convex.com> 
From: Guido.van.Rossum@cwi.nl
Date: Fri, 11 Dec 1992 11:47:53 +0100

I wrote:
>>As I see it, there are two possible ways of using MIME in HTTP+.  We
>>can either support MIME as the *only* data format (implementing any
>>extensions we need as new MIME content types or subtypes or additional
>>headers), or we we support MIME as one of the possible data formats.

Dan's reply:
>A terminology note here: there is no one "MIME data format." There's
>the ubiquitous message/rfc822 format that you can stick anything
>inside using MIME techniques. But the basic unit of information
>in the MIME spec is an _entity_ -- just an arbitrary stream of bytes.

OK, when I said MIME data format I meant MIME message format, and was
referring to the outer level only (and note that MIME *implies*
RFC822).  I certainly did not refer to a particular content-type, not
even to message/rfc822.  The only thing that isn't well-specified when
one talks about "a file in MIME format" is whether line breaks are
given as CRLF or as LF (or as something else).

>The question is, when you're sending an entity from one
>place to another, how do you know where the end is?

This is a matter for the transport agent, not for MIME -- by the time
you call in the MIME agent to handle the data you must *already* know
where the end is.  For entities contained in other entities (e.g. the
content-type family multipart/*) there is a way defined in MIME to
find the end of the inner entities, but this is not true for the
outermost entity.

>From the MIME point of view, an NNTP client and server have
>an implicit agreement that the entity going across the
>wire has a content-transfer-encoding of 7bit.
>
>This allows them to use the dot-on-a-line-by-iteself technique to
>terminate the entitiy.

MIME and NNTP should never need to talk to each other.  MIME is a UA
level format, NNTP is a message transfer agent protocol.  NNTP can use
the dot-on-a-line-by-itself convention not because it is a 7-bit
protocol (which it isn't -- although other message transfer protocols
like SMTP are) but because it is a line-based protocol.  MIME is also
mostly a line-based format, even if the content-transfer-encoding is
8bit -- it is only in binary mode that we get in trouble (since
conversion from one kind of line terminator to another is dangerous
for binary data).

>They also share assumptions about the content-type as
>a separate issue. The client assumes the response to an
>ARTICLE command is a message/rfc822 entity, while the
>response to a BODY command is text/plain.

That's a nice way of putting it.

>[Long description of why you want to put the byte count in the MIME
>headers omitted]
>
>It is somewhat intertwingled, but I still kinda like it.

And I still don't.  I have the feeling that it would be much easier to
adapt HTTP to other (non-TCP) transport protocols if the size of an
entity is given separately rather than computed from the entity itself
(after all this nonsense is only necessary because TCP doesn't have a
way to distinguish EOF from a broken connection).  As I understand it
your main objection is that under my proposal you will have to
construct the necessary headers in a buffer first.  I don't believe
that this is that much of a hassle in today's computers -- it
shouldn't be more than a couple of kilobytes even in extreme cases,
which is peanuts even for a standard PC.

An issue on which I don't have a strong opinion is whether we should
represent line separators as CRLF in the header -- anyone else?

Cheers,

--Guido van Rossum, CWI, Amsterdam <guido@cwi.nl>
"The lawnmower.  Surely such a gadget could not have been generated
independently in two separate areas."