Re: Content-Transfer-Encoding from Phillip M. Hallam-Baker on 1995-07-26 (ietf-http-wg@w3.org from July to September 1995)

From: Phillip M. Hallam-Baker <hallam@w3.org>
Date: Wed, 26 Jul 95 14:24:54 -0400
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <9507261823.AA18496@www18.w3.org>
someone has brought to my attention that there is a debate on a
packet encoding for content. Here are my comments:-

1) Boundary delimiters (like mime)
	These are absolutely, fundamentaly and irrevokably unacceptable. They
require the originator to be precient is choice of boundary marker,
they require the recipient to scan for them. 

Ad-hoc arguments resorting to MD5 and probablistic considerations
carry no weight with me. In the first place there is a very strong
probability that a persistent TCP/IP stream will be self referential,
that is make a response based upon the low level encoding of the
stream. Consider two people in a chat conversation debugging HTTP
connections, At some point somone will incorporate the message
boundary into a message and the system will come to a grinding halt.
In the second place there is the processing overhead of scanning a
live MPEG feed for a boundary delimeter.

This is simply not the right thing.

The only argument for the boundary strings I have heard is to simplify
MAIL gateways. I don't accept the argument that one compromises a
system for a minor simplification of backwards compatibility. If a
gateway likes boundary delimeters it can easily convert. The email
system is horribly compromised already by the numerous concessions it
makes to broken MTAs. One more should not hurt. Do not ask the clean
protocol to start making the same concessions by proxy.


2) On binary vs human readable.

I prefer that a protocol be faithful unto itself. Either be human
readable or let us lose with ASN.1 BER or something like it. There is
no point in half measures one way or the other. If you must consider
performance at this level use base 16. Base 10 is probably as
efficient since there is not the upper/lowercase ambiguity.

If you don't think this can be done well in an ascii based scheme
please make that statement. We can then decide how to produce a
generic scheme for employing binary encodings as MIME/RFC822 achieves
for ascii, human readable encodings.

Please do not attempt to create a hybrid that is neither one thing nor
the other. If you are going to have an ascii based scheme there will
be an overhead. I don't think its a very large one but it will exist.
There are a number of binarry encodings with worse overhead, ASN.1 DER
for example.

3) Encryption

Please remeber that people will want to encode streams using block
ciphers. This�requires provision for specifying the number of padding
octets. Note that PEM style padding cannot be used since is creates a
security problem when used for repeated numbers of small packets. It
is essential that this padding be random bytes.

4) On a fixed upper limit on the packet size.

This is not an acceptable proposal. Whatever size you chose I can
provide examples where there is a very good need for a bigger size.
Since history is littered with standards whose creators did not
anticipate future needs I don't think we should make an arbitrary
restriction which has no rational justification. I have just helped
order a pair of machines with 512Mb of RAM, I have used file systems
with a capacity of 6 Tb. I have built systems with a bandwidth of
6Tb/sec. 32 bits is simply not enough for these purposes. Experience
demonstrates that to build a system for the future we must at least be
able to satisfy the most exotic needs of the day.

Self describing binary length encodings are simple enough. ASN.1 has
one, one of the sensible suggestions in the original design that
survived the committee cycle?

ASN.1 has two legth encodings, short and long. The first octet defines
the length of the length. If bit 7 is clear the length is encoded in a
single octet a bits 0-6 of the first octet. Otherwise bits 0-6 give
the number of additional octets the length.

A perhaps cleaner way to do it combines the recycling unused bits in
the lead octet idea with the fact that most lengths will be most
conveniently expressed 
using 1, 2, 4 or 8 bytes. The decision table is as follows:

IF
    Bit 7 clear,	
	length is 1 octet, bits 0-6 give value
    Bit 6 clear,
	length is 2 octets, bits 0-5 give msb of value, octet 2 gives LSB
    Bit 5 clear,
	length is 4 octets, bits 0-4 give msb of value, octets 2-4 gives LSB
    Bit 4 clear,
	length is 8 octets, bits 0-3 give msb of value, octets 2-9 gives LSB
    TRUE
	these are reserved for future expansion.


Of these points the last is the most critical. If the first is not
accepted it will not be a significant disaster because I don't think
the protocol would be used. If a scheme with a fixed length gains
popularity it will be very hard to remedy it later on. Thus I would
like names of any proponents of a fixed length scheme with their
rationale. I promise faithfully to retain said list and produce it in
a suitable forum (eg IETF conference dinner) for an "I told you so"
speach.

-- 
Phillip M. Hallam-Baker            Not speaking for anoyone else
hallam@w3.org http://www.w3.org/hypertext/WWW/People/hallam.html
Information Superhighway -----> Hi-ho! Yow! I'm surfing Arpanet!
Received on Wednesday, 26 July 1995 11:29:16 UTC