Re: Comments on the HTTP/1.0 draft. from Chuck Shotton on 1994-12-08 (ietf-http-wg@w3.org from October to December 1994)

From: Chuck Shotton <cshotton@oac.hsc.uth.tmc.edu>
Date: Wed, 7 Dec 1994 19:58:59 -0600
To: Marc VanHeyningen <mvanheyn@cs.indiana.edu>, hallam@alws.cern.ch
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <ab0c143b08021004578b@[129.106.201.2]>
>>This discussion is about object-bodies ONLY.
>
>No, this discussion is about *textual* object-bodies ONLY.  I think everybody
>agrees that, in addition to headers, GIFs and audio files and MPEGs and
>everything else should be shipped around the network in their canonical form,
>rather than in some local form.  Luckily, most of the systems that people
>use happen, by a convenient coincidence, to locally use the canonical form
>or something easily converted into canonical form, so that there isn't a
>requirement for expensive conversion.

Right, textual object-bodies, the most important of which is text/html
(because of the problems introduced with <pre> text within the HTML).

>There are minor conversions; in a sense the Macintosh local form could be
>said to include the resource fork as well as the data fork, but I don't
>think anybody thinks all clients should understand macbinary even though
>this could be said to be the local form of Mac files.  Mac servers should
>convert, say, GIF files stored on a Mac to canonical form by discarding
>the resource fork and sending only the data fork.  Do we at least agree
>on this point?

Sure, but text files by their very nature are less precise than a GIF file.
We all recognize when a picture wasn't transfered properly, but in the case
of HTML, white space is usually insignificant. In the specific case of
<pre> text within HTML (or even text/plain for that matter), how to
interpret line ends is of critical importance. You can easily argue that
cannonicalization of text solves this problem, and I would agree 100% if it
weren't for the conversion problem.

The conversion process for turning a Mac GIF into a Unix GIF is a trivial
exercise. Simply read blocks of data from the data fork and spew them out
the IP connection. No extra CPU power required. The process to convert text
to cannonical form is not as simple, because there isn't a standard
cross-platform representation for text data. In the simplest case, line
ends vary, to say nothing of character sets. This conversion process is CPU
intensive and isn't required for the current suite of HTTP applications. My
opposition to a cannonical text format requirement is based solely on the
performance hit. In truth, MacHTTP already does this conversion and I can
say with certainty that files sent without conversion transfer twice as
fast as ones that must be parsed first.

>What you are suggesting is that everything go in canonical form *except* text,
>which should be considered a special case because it's common, has varying
>local forms, but those local forms are not inordinately difficult to understand
>in a flexible fashion.  This may be a reasonable exception.  But it is an
>exception, an argument that textual object-bodies should be a special case.

That is somewhat close to what I'm saying. Text is a special case because
text is a special case. It isn't as clearly defined as any of the other
structured binary types. By its very nature, clients and servers must take
special cares to deal with it. Because they already do, we are able to
transfer text files without line by line conversion

>>I have yet to hear a factual, supported reason why this must be done or
>>else HTTP will fail.
>
>That's OK, I have yet to hear a clear statement from you which of the two
>positions I mentioned in the first paragraph of my previous message is yours.
>Do you think that canonicalizing line breaks is technically fine but just too
>expensive to implement, or that it's just plain dumb?

It's just plain dumb because it's too expensive to implement. ;) Seriously,
forcing interpretation of text is a bad thing from a performance
perspective, not because it doesn't make sense to do in an ideal world.

>>You are asking that current practice be
>>discarded in favor of an idea that has not been proven to be of any use to
>>the HTTP community.
>
>No, I am stating that I think existing standards and practices outside of
>HTTP are being dismissed without due consideration.  I fully expect us to
>eventually get a reasonable approach which either tolerates or standardizes
>existing practice with regard to the special treatment of textual objects.
>The question is whether it should tolerate it, as the current spec appears
>to do, or standardize it, and exactly how.

I think it's beyond the scope of the HTTP standard to try and standardize
the representation of text information across all platforms.

>Albert Lunde said:
>>Now, it seems like we are saying is that current practice
>>(not just "bad" servers) is to treat EOL differently in
>>the object body for performance reasons.
>>
>>In this, and in other ways, we are not just quoting the MIME
>>spec, we are sort of rewriting it.
>
>Yes.  Absolutely.

No, this is the HTTP spec! As Roy has said earlier, there are lots of
things that are MIME-like, and may even be considered MIME by some people,
but the HTTP standard doesn't say how to interpret the semantics of this
MIME info. Rather, it says how to parse it. Interpretation is the MIME
standard's problem. The MIME standard says nothing about the "proper"
format for GIF files, XBMs, MPEGs, or any of the other content-types
(except for text/plain, perhaps) used by HTTP. The HTTP standard doesn't
either. So, why are we singling out text in object bodies?

-----------------------------------------------------------------------
Chuck Shotton
cshotton@oac.hsc.uth.tmc.edu                           "I am NOT here."
Received on Wednesday, 7 December 1994 17:58:11 UTC