Re: Binary data in text-based protocol formats from Chris Newman on 1999-02-23 (ietf-discuss@w3.org from February 1999)

From: Chris Newman <chris@innosoft.com>
Date: Mon, 22 Feb 1999 16:51:38 -0800 (PST)
To: Jacob Palme <jpalme@dsv.su.se>
Cc: discuss@apps.ietf.org
Message-id: <Pine.SOL.3.95.990222161644.3289Q-100000@elwood.innosoft.com>

On Sat, 20 Feb 1999, Jacob Palme wrote:
> At 04.18 +0100 99-02-20, Chris Newman wrote:
> > I should also point out the option of a hybrid encoding.  Use a simple
> > binary structure with fixed-length ASCII character strings for protocol
> > keywords.  You get all the advantages of binary encoding, and a protocol
> > dump is at least partially useful.  Secure Shell 2 has a different
> > interesting hybrid characteristic -- it uses length-counted text strings
> > for extensibility-oriented feature lists.
> 
> A nice idea. Is it common in standards?

Hasn't happened much in the IETF, mostly because the apps area has a text
bias and lower level areas have a binary bias.  The PNG spec (RFC 2083) 
uses four-letter ASCII labels for chunks, although that's not IETF
standards track.

> Binary data in textual encodings seems to be a problem. How
> is this usually handled? Base64 is of course an option.
> MIME uses, if I understand it rightly, the convention that
> <CRLF>--boundary text<CRLF> is end marker of a binary body
> part. Not very neat.
> ...
> To indicate the end of binary data with an octet-length
> value before the binary data seems to me the neatest way,
> but it seems not to be very popular in standards.

Length-counting has two problems:

(1) It completely falls apart when common newline or charset conversions
    are applied.

(2) It requires that you know the size of the object in advance, or that
    you use a complex chunking technique such as that used by SMTP BDAT.

For wire protocols, (1) isn't much of an issue and (2) can often be
tolerated. In my experience, the length-counted literals in IMAP are less
error-prone and more efficient than the dot-stuffing in POP and SMTP
(mostly because dot-stuffing creates a rarely used codepath and is
intrusive when working with a canonical CRLF MIME mailstore).  But (2) 
causes serious problems for mailstores that don't use canonical CRLF MIME
format.

For textual data formats, I think MIME got it right (a lot of smart people
spent a lot of time on MIME).  In fact, well constructed MIME boundaries
make for a very nice Boyer-Moore search so you can find the end of the
object in sub-linear time.

For binary data formats, a length-counted chunking solution is most
common.

		- Chris

Received on Monday, 22 February 1999 19:57:44 UTC